parent: domain
SafeDiffuser SafePlanning with Diffusion Probabilistic Models
Building CooperativeEmbodied Agents Modularly with Large Language Models
Planning withDiffusion forFlexible BehaviorSynthesis
PlaSma Making Small Language Models Better Procedural Knowledge Models for (Counterfactual) Planning
LLM, revision of a plan to cope with a counterfactual situation
ToolChain* Efficient Action Space Navigation in Large Language Models with A* Search
entire action space as a decision tree, then identifying the most low-cost valid path as the solution
==cheapest cost decision==
K-LevelReasoning with Large Language Models
decision-making in evolving environments, dynamic reasoning
TravelPlanner A Benchmark for Real-World Planning with Language Agents
llms have success rate of 0.6% on travel planning
CodePlan Repository-level Coding using LLMs and Planning
context derived from the entire repository, previous code changes
package migration, fixing errors reports from static analysis or testing, and adding type annotations or other specifications
Generative Agents Interactive Simulacra of Human Behavior, sims
DiscoveringAdaptable Symbolic Algorithms from Scratch
evolve(activate) safe control policies that avoid falling when individual limbs suddenly break
Dynalang Learning to Model the World with Language
agents that leverage diverse language that describes state of the world with feedback
Diffusion-CCSP: CompositionalDiffusion-Based Continuous Constraint Solvers
novel combinations of known constraint
Dolphins Multimodal Language Model for Driving
holistic understanding of intricate driving scenarios and multimodal instructions
PhotoBot Reference-Guided Interactive Photography via Natural Language
take photo at best poses (cinematography), best perspectives and povs
STEVE-1 A Generative Model for Text-to-Behavior in Minecraft
unCLIP is effective for creating instruction-following sequential decision-making agents
pretrained models like VPT and MineCLIP, STEVE-1 costs just $60 to train
JARVIS-1 Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
multimodal memory, planning using both pre-trained knowledge and actual game experiences
BeTAIL Behavior Transformer Adversarial Imitation Learning from Human Racing Gameplay
without requiring hand-designed reward functions
DAG Amendmentfor Inverse Control of Parametric Shapes
depending the size of the brush and the location, infers the intention
and modifies the hyperparameters, not just one axis but whole arm-mechanisms
RT-X the largest open-source robot dataset
MimicPlay imitation learning algorithm that extracts the most signals from unlabeled human motions
BloombergGPT A Large Language Model for Finance (economy)
Diffusion-based Generation Optimization, andPlanning in 3D Scenes
ConceptGraphs Open-Vocabulary 3D Scene Graphs for Perception and Planning
2D foundation models then fusing their output to 3D by multi-view association
complex reasoning over spatial and semantic concepts.
LangNav Language as a Perceptual Representation for Navigation
select an action(from instruction) based on the current view and the trajectory history
3D-GPT Procedural 3D Modeling with Large Language Models
instruction-driven 3D modeling
evolving(and enhancing) their detailed forms while dynamically adapting on subsequent instructions
ImageSynthesis with Graph Conditioning: CLIP-Guided Diffusion Models for Scene Graphs
leveraging clip scene understanding instead of layouts, GAN based
Text2Street Controllable Text-to-image Generation for Street Views
text-to-map generation integrating road structure-topology, object layout and weather description
SemCity Semantic Scene Generation with Triplane Diffusion (refinement and inpainting)
Proceduralterrain generation with style transfer
drawing style from real-world height maps unto perlin noise
RealmDreamer Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion
optimizes a 3D Gaussian Splatting, allows 3D synthesis from a single image
DreamScene 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling
multi-timestep sampling strategy guided by the formation patterns of 3D objects
enables targeted adjustments
BlockFusion Expandable 3D Scene Generation using Latent Tri-plane Extrapolation
semantically and geometrically meaningful transitions that harmoniously blend with the existing scene
2D layout conditioning-control
InstructScene Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior
with semantic graph prior and a layout decoder
GALA3D TowardsText-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
LLM to generate initial layout for geometric constrain
Sketch-to-Architecture Generative AI-aided Architectural Design
generate conceptual floorplans and 3D models from simple sketches
RoomDreamer Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture
using a cubemap, with depth(object plane to screen plane) and distance map
Holodeck promptablesystem that can generate diverse, customized, and interactive 3D environments
AI4Animation Neural StateMachine for Character-Scene Interactions
NIFTY Neural Object Interaction Fields for Guided Human Motion Synthesis
neural interaction field attached to a specific object
guided diffusion model trained on generated synthetic data
Story-to-Motion Synthesizing Infinite and Controllable Character Animation from Long Text
text-to-motion various locations and specific motions
motion semantic trajectory constraint
CHOIS: ControllableHuman-Object Interaction Synthesis
diffusion with constraints
language informs style and intent
waypoints ground the motion and can be effectively extracted using high-level planning methods
ROAM Robust and Object-aware Motion Generation using Neural Pose Descriptors
method for human-object interaction synthesi
given unseen object, optimise for closest in the feature space
TRUMANS Scaling Up Dynamic Human-Scene Interaction Modeling
15 hours of human interactions across 100 indoor scenes
diffusion-based autoregressive model that efficiently generates HSI sequences of any length
3D-LLM Injecting the 3D World into Large Language Models
llm with 3d understanding such as spatial relationships, affordances, physics, layout
can take 3D point clouds and their features as input
PhysicallyGrounded Vision-Language Models for Robotic Manipulation
planning on tasks that require reasoning about physical object concepts
Motion Mamba Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM
long-sequence and efficient motion
Auto-Instruct Automatic Instruction Generation and Ranking for Black-Box Language Models
method to automatically improve the quality of LLM instructions
CreativeRobot Tool Use with Large Language Models
input instructions and outputs executable code for controlling robots(tools)
A Real-WorldWebAgent with Planning, Long Context Understanding, and Program Synthesis (website is scene)
LLM-driven agent to complete instruction tasks on real websites
A Zero-ShotLanguage Agent for Computer Control with Structured Reflection
partially observed environment, iteratively learning from its mistakes, structured thought management
SceneCraft An LLM Agent for Synthesizing 3D Scene as Blender Code
models a scene graph as a blueprint, detailing spatial relationships among assets in the scene
then writes blender Python scripts based on this graph, translating relationships into numerical constraints for asset layout
Pixel-WiseColor Constancy via Smoothness Techniques in Multi-Illuminant Scenes
anti-abnormal-light filter by learning pixel-wise illumination maps caused by multiple light sources
PoseDiffusion Solving Pose Estimation via Diffusion-aided Bundle Adjustment
modelling the distribution of camera poses given input images
Detector-FreeStructure from Motion
EffectiveWhole-body Pose Estimation with Two-stages Distillation
instead of openpose preprocessor
DECO Dense Estimation of 3D Human-Scene Contact In The Wild
recognize 3D contact between body and objects
Pose Anything A Graph-BasedApproachfor Category-AgnosticPose Estimation
people, animals, furniture, faces
ReconstructingClose Human Interactions from Multiple Views
input multi-view 2D keypoint heatmaps and reconstruct the pose of each individual
Extreme Two-View Geometry From Object Poses with Diffusion Models
extreme viewpoint changes, with no co-visible regions in the images
Real-timeMonocular Full-body Capture in World Space via Sequential Proxy-to-Motion Learning
body tracking, only one view needed
D3PRefiner A Diffusion-based Denoise Method for 3D Human Pose Refinement
refine the output of any existing 3D pose estimator (monocular camera-based 3D pose estimation)
SMPLer monocular 3D human motion capture, Motion Capture from Any Video
GPAvatar Generalizableand Precise Head Avatar from Image(s)
recreate the head avatar and precisely control expressions and postures
IMUSIC IMU-based Facial Expression Capture
facial expression capture using purely IMU signals
privacy-protecting, hybrid capture against occlusions, detecting movements often invisible
Coverage Axis++ Efficient Inner Point Selection for 3D Shape Skeletonization
strategy that considers both shape coverage and uniformity to derive skeletal points
Understanding3D Object Interaction from a Single Image
DistilledFeature Fields Enable Few-Shot Language-Guided Manipulation
3D geometry understanding (tokens) with 2D rich semantics
SceneVerse Scaling 3D Vision-Language Learning for Grounded Scene Understanding
scene and object caption, object referral
Learning Generalizable Feature Fields for Mobile Manipulation
GeFF(Generalizable Feature Fields)
for both navigation and manipulation in real time