:PROPERTIES: :ID: e9be16f7-8032-4509-9aa9-7843836eacd9 :END: #+title: domain #+filetags: :neuralnomicon: #+SETUPFILE: https://fniessen.github.io/org-html-themes/org/theme-readtheorg.setup * ARCHITECTURE ** PHYSICS - CHEMICAL - PARTICLES - Complex Physics with Graph Networks https://arxiv.org/pdf/2002.09405 - [[https://github.com/xuan-li/PAC-NeRF][PAC-NeRF]]: [[https://arxiv.org/abs/2303.05512][Physics]] Augmented Continuum Neural Radiance Fields - Scaling Spherical CNNs: vs graph neural network, molecular - better than the spectral domain through the convolution theorem *** POINT CLOUD :PROPERTIES: :ID: 5295bde1-5eb3-4e2b-842b-ee415c831f94 :END: - [[id:35add1fe-b835-49c7-99f4-8aa4321a3904][SUPERPRIMITIVE]] - [[https://x.com/dreamingtulpa/status/1852640716377223387][MoGe]] can turn images and videos into 3D point maps! **** PIXEL ALIGNMENT :PROPERTIES: :ID: c724444e-f844-4d34-ae6f-e76239bdadbd :END: - [[https://dust3r.europe.naverlabs.com/][DUSt3R]]: [[https://github.com/naver/dust3r][Geometric]] 3D Vision Made Easy - global alignment of pixels from sparse views, no need for camera position **** POINT CLOUD DIFFUSION - [[https://twitter.com/_akhaliq/status/1668806682783170560][3D molecule]] generation by denoising voxel grids - diffusion model applied to atom point clouds - [[https://arxiv.org/abs/2402.11241][DiffPoint]]: Single and Multi-view Point Cloud Reconstruction with ViT Based Diffusion Model - divide the noisy point clouds into irregular patches, target points based on input images - [[https://arxiv.org/abs/2404.03566][PointInfinity]]: Resolution-Invariant Point Diffusion Models - efficient training low-resolution point clouds, allowing high-resolution generated during inference - transformer-based architecture with a fixed-size, resolution-invariant latent representation ** BIOLOGY - ANALOG - biology inspired AI: genes or local-context-evolution https://youtu.be/vf18FLdKkY4 CPPN algorithm - next paper: hyper http://axon.cs.byu.edu/~dan/778/papers/NeuroEvolution/stanley3**.pdf - continuation: http://eplex.cs.ucf.edu/ESHyperNEAT/ - neat algorithm: https://youtu.be/3nbvrrdymF0 - [[https://youtu.be/vf18FLdKkY4][The AI Epiphany]]: NeuroEvolution of Augmenting Topologies (NEAT) and Compositional Pattern Producing Networks (CPPN) - [[https://www.cs.toronto.edu/~hinton/FFA13.pdf][Forward-Forward]] ([[https://twitter.com/martin_gorner/status/1599755684941557761][vs Backpropagation]]) analog computers ** ENERGY BASED - JEPA - https://youtu.be/jSdHmImyUjk - Self-Supervised Learning, Energy-Based Models, and hierarchical predictive - the encoder ignoring useless information - https://openreview.net/forum?id=BZ5a1r-kVsf - [[https://arxiv.org/abs/2302.07253][Energy Transformer]] more efficient, electric - transformers without skip connections or normalisation layers https://arxiv.org/pdf/2302.10322.pdf - Conformers: local and global attention ** OTHER ALTERNATIVES - [[id:88490b18-3eaf-402d-b8ef-eca7a125ce93][PAE - PHASE]] - [[id:bd80ad1d-64de-4445-98e8-0cec31e1ab32][STATE SPACE]] * APPLICATION - [[id:8f2a9969-2221-4993-9113-6e5e3e5874f5][NERF ALIKES]] representations for video-image - [[https://twitter.com/skalskip92/status/1747716106226147789][supervision]]: [[https://github.com/roboflow/supervision][time]] objects spend in the zone - like employees at their table, cars at parking lot ** UI - [[https://arxiv.org/abs/2402.11627][Interactive Garment]] Recommendation with User in the Loop (algorithm) - ingesting user feedback so to improve its recommendations and maximize user satisfaction *** OS CONTROL - [[https://twitter.com/_akhaliq/status/1757239969423175839][OS-Copilot]]: Towards Generalist Computer Agents with Self-Improvement - strong generalization to unseen applications via accumulated skills from previous tasks - [[https://twitter.com/_akhaliq/status/1778605019362632077][OSWorld]]: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments - accomplish complex computer tasks with minimal human intervention - multimodal agents **** WEBSITE CONTROL - [[https://twitter.com/hellokillian/status/1749875183173947575][open-source]] Rabbit - [[https://twitter.com/_akhaliq/status/1750707472703045724][WebVoyager]]: Building an End-to-End Web Agent with Large Multimodal Models - interacting with real-world websites :PROPERTIES: :ID: 62b0c837-182d-4d2c-9289-ce7259330e08 :END: *** DRAGGING UI - dragGan, [[id:208c064d-f700-4e8f-a4ab-2c73c557f9e3][DRAG]], [[id:6d9df8fd-398d-44da-b7d7-cd7146b1b7a8][DAG]] - [[id:dc7fd8d1-1461-4882-a671-a5935e3d15b5][DRAGTEX]] mesh - [[id:044077e6-d2ea-415f-9ec0-5ae727626dc1][DRAGANYTHING]] motion control in video *** HAPTIC - [[https://twitter.com/_akhaliq/status/1738015884864569632][Neural]] feels with neural fields: Visuo-tactile perception for in-hand manipulation - tracking and reconstruction of novel objects for in-hand manipulation - [[https://twitter.com/_akhaliq/status/1739513038875562065][MACS]]: Mass Conditioned 3D Hand and Object Motion Synthesis - improve naturalness of the synthesized 3D hand object motions - generalize to unseen masses - [[https://twitter.com/_akhaliq/status/1760891745036902627][CyberDemo]]: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation - simulated human demonstrations for real-world tasks *** STORYTELLING - [[https://browse.arxiv.org/abs/2402.00319][SCO-VIST]]: Social Interaction Commonsense Knowledge-based Visual Storytelling - takes a graph representing plot points and creates bridges between plot points - [[id:19d99453-7d66-41b1-80e8-fbe91d035084][OPEN-VOCABULARY]] [[id:bdd9160a-2438-4af0-a6f9-618b87096727][STORYTELLING CAPTIONING]] - [[id:65812d6a-a81d-47f2-a7ad-25c94e2ff70a][STORYTELLER DIFFUSION]] - [[id:cc058fea-c2dd-4f7f-aa59-156825bed0ef][CARTOON]] [[id:56a81747-2a44-410e-9ca0-26f366829f3e][INTO MANGA]] [[id:247ce640-4a9c-4cb9-94f9-d013848f47ce][COLORIZATION]] - [[id:3a1a687e-63e6-4552-8803-a06deeb494c6][LAYOUT LLM]] [[id:dafb1713-5d08-40de-b445-76d25f2cf070][LAYOUT DIFFUSION]] ** FACE *** FACE RECOGNITION - PAM: [[https://arxiv.org/abs/2403.19980][A Parallel]] Attention Network for Cattle Face Recognition - focuses on local and global features - for animal husbandry and behavioral research *** FACE SWAP - [[https://arxiv.org/pdf/2402.03188.pdf][Towards]] mitigating uncann(eye)ness in face swaps via gaze-centric loss terms - novel loss equation for the training of face swapping models *** EMOTIONS :PROPERTIES: :ID: e57dbe75-a5bd-48f7-94ee-4935bdcc82e1 :END: - [[https://arxiv.org/abs/2404.01243][A Unified]] and Interpretable Emotion Representation and Expression Generation - compound emotions ** SPEECH RECOGNITION :PROPERTIES: :ID: daae8285-8325-4096-b421-61bb9df79d4a :END: - [[id:aeca80bb-38f3-4343-a214-67e3b4df245e][CAPTIONING]] - [[https://twitter.com/_akhaliq/status/1691712447533740338][Text Injection]] for Capitalization and Turn-Taking Prediction in Speech Models - unpaired text-only data used to enhance paired audio-text data - to detect turns - [[https://opentalker.github.io/video-retalking/][VideoReTalking]]: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild ** GEOGRAPHY - [[https://twitter.com/_akhaliq/status/1698509330117460368][CityDreamer]]: Compositional Generative Model of Unbounded 3D Cities (imagining map layout city) - [[id:b27d9d84-d4a1-49b3-9c03-be873e8aa18b][GENERATE BLENDER]] *** NEURAL MAPPING - [[https://twitter.com/_akhaliq/status/1697529656004534400][Active]] Neural Mapping; scene reconstruction, gain knowledge of the environment - [[https://twitter.com/_akhaliq/status/1699336788060381300][Doppelgangers]]: Learning to Disambiguate Images of Similar Structures - can distinguish illusory matches in difficult cases, then spatial distribute local keypoints