Complex Physics with Graph Networks https://arxiv.org/pdf/2002.09405
Scaling Spherical CNNs: vs graph neural network, molecular
better than the spectral domain through the convolution theorem
MoGecan turn images and videos into 3D point maps!
DUSt3R Geometric3D Vision Made Easy
global alignment of pixels from sparse views, no need for camera position
3D moleculegeneration by denoising voxel grids
diffusion model applied to atom point clouds
DiffPoint Single and Multi-view Point Cloud Reconstruction with ViT Based Diffusion Model
divide the noisy point clouds into irregular patches, target points based on input images
PointInfinity Resolution-Invariant Point Diffusion Models
efficient training low-resolution point clouds, allowing high-resolution generated during inference
transformer-based architecture with a fixed-size, resolution-invariant latent representation
biology inspired AI: genes or local-context-evolution https://youtu.be/vf18FLdKkY4 CPPN algorithm
next paper: hyper http://axon.cs.byu.edu/~dan/778/papers/NeuroEvolution/stanley3**.pdf
continuation: http://eplex.cs.ucf.edu/ESHyperNEAT/
neat algorithm: https://youtu.be/3nbvrrdymF0
The AI Epiphany NeuroEvolution of Augmenting Topologies (NEAT) and Compositional Pattern Producing Networks (CPPN)
Forward-Forward(vs Backpropagation analog computers
JEPA - https://youtu.be/jSdHmImyUjk
Self-Supervised Learning, Energy-Based Models, and hierarchical predictive
the encoder ignoring useless information
Energy Transformermore efficient, electric
transformers without skip connections or normalisation layers https://arxiv.org/pdf/2302.10322.pdf
Conformers: local and global attention
NERF ALIKESrepresentations for video-image
supervision timeobjects spend in the zone
like employees at their table, cars at parking lot
Interactive GarmentRecommendation with User in the Loop (algorithm)
ingesting user feedback so to improve its recommendations and maximize user satisfaction
OS-Copilot Towards Generalist Computer Agents with Self-Improvement
strong generalization to unseen applications via accumulated skills from previous tasks
OSWorld Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
accomplish complex computer tasks with minimal human intervention
multimodal agents
open-sourceRabbit
WebVoyager Building an End-to-End Web Agent with Large Multimodal Models
interacting with real-world websites
DRAGTEXmesh
DRAGANYTHINGmotion control in video
Neuralfeels with neural fields: Visuo-tactile perception for in-hand manipulation
tracking and reconstruction of novel objects for in-hand manipulation
MACS Mass Conditioned 3D Hand and Object Motion Synthesis
improve naturalness of the synthesized 3D hand object motions
generalize to unseen masses
CyberDemo Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
simulated human demonstrations for real-world tasks
SCO-VIST Social Interaction Commonsense Knowledge-based Visual Storytelling
takes a graph representing plot points and creates bridges between plot points
PAM: A ParallelAttention Network for Cattle Face Recognition
focuses on local and global features
for animal husbandry and behavioral research
Towardsmitigating uncann(eye)ness in face swaps via gaze-centric loss terms
novel loss equation for the training of face swapping models
A Unifiedand Interpretable Emotion Representation and Expression Generation
compound emotions
Text Injectionfor Capitalization and Turn-Taking Prediction in Speech Models
unpaired text-only data used to enhance paired audio-text data
to detect turns
VideoReTalking Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
CityDreamer Compositional Generative Model of Unbounded 3D Cities (imagining map layout city)
ActiveNeural Mapping; scene reconstruction, gain knowledge of the environment
Doppelgangers Learning to Disambiguate Images of Similar Structures
can distinguish illusory matches in difficult cases, then spatial distribute local keypoints