:PROPERTIES: :ID: 787f08d5-50c1-49aa-a71a-1cbff1874f8b :END: #+title: mesh #+filetags: :neuralnomicon: #+SETUPFILE: https://fniessen.github.io/org-html-themes/org/theme-readtheorg.setup - [[id:802df88e-d2f7-4849-9def-43190e1cebde][SCENE SYNTHESIS]] [[id:1b2337de-9064-40e4-a82c-3e8e48084ad0][GAUSSIAN GENERATION]] [[id:5e1ee0b4-8493-44e4-b0cf-89b429a78532][LAYOUT]] * OTHERS ** DIFFERENT PRIMITIVE *** FLEXIDREAMER - [[https://flexidreamer.github.io/][FlexiDreamer]]: Single Image-to-3D Generation with FlexiCubes - leveraging a flexible gradient-based extraction known as FlexiCubes - direct acquisition of the target mesh - 1 minute, first extracting mesh then texturing ** MESH HUMAN FEEDBACK :PROPERTIES: :ID: 411493fe-a082-477f-923c-9a048dab036e :END: - [[https://twitter.com/_akhaliq/status/1771006768702693601][DreamReward]]: [[https://jamesyjl.github.io/DreamReward/][Text-to-3D]] Generation with Human Preference - 3D reward model * CAD - [[https://arxiv.org/abs/2401.15563][BrepGen]]: A B-rep Generative Diffusion Model with Structured Latent Geometry - structured latent geometry in a hierarchical tree, representing a whole CAD solid - generating complicated geometry * FROM SOME SOURCE - [[https://twitter.com/_akhaliq/status/1666643196120637443][ARTIC3D]]: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections - using 2d diffusion as prior - [[id:c724444e-f844-4d34-ae6f-e76239bdadbd][PIXEL ALIGNMENT]] ** SKETCH - [[https://twitter.com/_akhaliq/status/1678597161360019459][Sketch-A-Shape]]: Zero-Shot Sketch-to-3D Shape Generation - both sketches and CADs into meshes - without requiring any paired datasets during training - [[https://arxiv.org/abs/2401.14257][Sketch2NeRF]]: Multi-view Sketch-guided Text-to-3D Generation (sketch control to 3D generation) - [[https://arxiv.org/abs/2402.03690][3Doodle]]: Compact Abstraction of Objects with 3D Strokes - abstract sketches containing 3D characteristic shapes of the objects - [[https://arxiv.org/pdf/2404.01843.pdf][Sketch3D]]: Style-Consistent Guidance for Sketch-to-3D Generation - generating realistic 3D Gaussians consistent with color-style described in textual description ** SIMPLIFICATION - [[https://twitter.com/_akhaliq/status/1671361950679277568][Point-Cloud]] Completion with Pretrained Text-to-image Diffusion Models - from incomplete point cloud to 3d model - [[https://twitter.com/_akhaliq/status/1678969402107076609][Differentiable Blocks]] World: Qualitative 3D Decomposition by Rendering Primitives - turned into big blocks; interpretable, easy to manipulate and suited for physics-based simulations - FlexiCubes: [[https://twitter.com/_akhaliq/status/1689844535780614145][Flexible Isosurface]] Extraction for Gradient-Based Mesh Optimization - hierarchically-adaptive meshes, local flexible adjustments - [[https://twitter.com/_akhaliq/status/1770272886063763586][ComboVerse]]: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance - learning to combine multiple models - adjust their sizes, rotation angles, and locations to create a 3D asset that matches the given image - [[https://getmesh.github.io/][GetMesh]]: A Controllable Model for High-quality Mesh Generation and Manipulation - mesh autoencoder, e-organized to the triplane representation by projecting the points to the triplane - combining mesh parts across categories, adding/removing mesh parts, *** LIFTING :PROPERTIES: :ID: 080ee2fd-d3ac-400f-a5f5-f97178f98782 :END: - [[https://twitter.com/_akhaliq/status/1737327939186753776][3D-LFM]]: Lifting Foundation Model ==best== - lfting of 3D structure and camera from 2D landmarks - [[https://arxiv.org/abs/2401.15841][2L3]]: Lifting Imperfect Generated 2D Images into Accurate 3D - utilize multi-view 3D reconstruction to fuse generated MV images into consistent 3D objects ** IMAGE TO 3D - [[https://scene-dreamer.github.io/][SceneDreamer]]: [[https://github.com/FrozenBurning/SceneDreamer][Unbounded]] 3D Scene Generation from 2D Image Collections (landscapes, scenarios) - [[https://twitter.com/_akhaliq/status/1674618752917295105][Michelangelo]]: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation - image + text + shape prior = mesh - [[https://twitter.com/_akhaliq/status/1674617785119305728][One-2-3-45]]: Any Single Image to 3D Mesh in 45 Seconds without Per-Shape Optimization - use sd to force consistency of diffusion generation - [[https://twitter.com/_akhaliq/status/1675684794653351936][Magic123]]: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors ==best== - for both stages; first stage coarse geometry first then details - [[https://twitter.com/_akhaliq/status/1725028222398546071][DMV3D]]: Denoising Multi-View Diffusion using 3D Large Reconstruction Model (generates nerf) - [[https://twitter.com/_akhaliq/status/1732226396636774551][ImageDream]]: [[https://github.com/bytedance/ImageDream][Image-Prompt]] Multi-view Diffusion for 3D Generation ==best== - [[https://twitter.com/_akhaliq/status/1737313852600041576][Customize-It-3D]]: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior - subject-specific and multi-modal diffusion model; gets NeRF - enhances texture from coarse - [[id:5762b4c1-e574-4ca5-9e38-032071698637][DREAMDISTRIBUTION]] - [[https://twitter.com/_akhaliq/status/1740233154471084445][HarmonyView]]: [[https://github.com/byeongjun-park/HarmonyView][Harmonizing]] Consistency and Diversity in One-Image-to-3D - [[https://twitter.com/sstj389/status/1747672689593246135][ZeroShape]]: [[https://zixuanh.com/projects/zeroshape.html][Regression-based]] Zero-shot Shape Reconstruction (just the shape) - rained to directly regress the object shape, computationally efficient - [[https://twitter.com/_akhaliq/status/1769671284521066827][FDGaussian]]: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model - extract 3D geometric features from the 2D input - incorporating attention to fuse images from different viewpoints - [[https://github.com/fuxiao0719/GeoWizard][GeoWizard]]: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image - encodes the image, depth, and normal into latent space, fed into U-Net to generate - [[https://arxiv.org/abs/2403.13663][T-Pixel2Mesh]]: Combining Global and Local Transformer for 3D Mesh Generation from a Single Image - coarse-to-fine to refine mesh details - [[https://twitter.com/_akhaliq/status/1777873717512523978][Magic-Boost]]: Boost 3D Generation with Mutli-View Conditioned Diffusion - refines coarse generative results, high consistency *** FAST - [[https://twitter.com/_akhaliq/status/1737690908135485815][Splatter]] Image: Ultra-Fast Single-View 3D Reconstruction - [[https://x.com/yanpei_cao/status/1748001059694649837?s=20][TGS]]: Triplane meets Gaussian Splatting ==best== - fast single-view 3d reconstruction with transformers **** LRM - [[https://twitter.com/_akhaliq/status/1722466519253242263][LRM]]: Large Reconstruction Model for Single Image to 3D (5 seconds) - learn to directly predict NeRF from image - [[https://twitter.com/_akhaliq/status/1737855357631054141][OpenLRM]]: Open-Source Large Reconstruction Models - Image-to-3D in 10+ seconds! *** ZERO-1-TO-3 - [[https://github.com/cvlab-columbia/zero123][Zero-1-to-3]]: [[https://zero123.cs.columbia.edu/][Zero-shot]] One Image to 3D Object: [[https://twitter.com/yvrjsharma/status/1716945484194177461][Zero123Plus]] is model - [[https://github.com/deepseek-ai/DreamCraft3D#dreamcraft3d][DreamCraft3D]]: [[https://github.com/deepseek-ai/DreamCraft3D][Hierarchical]] 3D Generation with Bootstrapped Diffusion Prior ==best== - image to guide the geometry sculpting and texture boosting - imbuing it with 3D knowledge of the scene being optimized = view-consistent guidance for the scene - [[https://twitter.com/_akhaliq/status/1717381753948529133][Wonder3D]]: [[https://twitter.com/marc_habermann/status/1717808998236217463][Single]] Image to 3D using Cross-Domain Diffusion ==best== - geometry-aware normal fusion algorithm that merges the generated 2D representations - [[id:44943c87-ca5b-4604-840e-ff52993c1bf1][PERFLOW]] - [[https://github.com/chinhsuanwu/ifusion][iFusion]]: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views - repurposing Zero123 for camera pose estimation - [[https://github.com/Wi-sc/ViewFusion][ViewFusion]]: Towards Multi-View Consistency via Interpolated Denoising - auto-regressive method, previously generated views as context for the next view generation **** ISOTROPIC3D :PROPERTIES: :ID: 58953d7d-4769-4476-aa72-ced6f6f4d95b :END: - [[https://twitter.com/_akhaliq/status/1769544081208684586][Isotropic3D]]: Image-to-3D Generation Based on a Single CLIP Embedding - fine-tune a text-to-3D diffusion model by substituting its text encoder with an image encoder - single image CLIP embedding to generating multi-view images **** HYPERDREAMER - [[https://arxiv.org/pdf/2312.04543.pdf][HyperDreamer]]: Hyper-Realistic 3D Content Generation and Editing from a Single Image - interactively select region via clicks and edit the texture text guidance - modeling region-aware materials with high-resolution textures and enabling user-friendly editing - guidance: semantic segmentation and data-driven priors(albedo, roughness, specular properties) **** ESCHERNET - [[https://kxhit.github.io/EscherNet][EscherNet]]: A Generative Model for Scalable View Synthesis - self-attention with M target views to ensure target consistency, more consistent across frames **** TRIPOSR - [[https://twitter.com/_akhaliq/status/1764841524431392794][TripoSR]]: - capable of creating high-quality outputs in less than a second - [[https://twitter.com/blizaine/status/1766570817800765816][USDZ]] format - awesome tripo hd using layer separation - https://twitter.com/toyxyz3/status/1803511432082051201 **** FDGAUSSIAN :PROPERTIES: :ID: 2c09738a-baef-4db0-a7d8-e9551c3dd2df :END: - [[https://arxiv.org/abs/2403.10242][FDGaussian]]: Fast Gaussian Splatting from Single Image via Geometric-aware Diffusion Model - extract 3D geometric features from the 2D input **** MVD-FUSION - [[https://mvd-fusion.github.io/][MVD-Fusion]]: Single-view 3D via Depth-consistent Multi-view Generation - output to multiple 3D representations such as point clouds, textured meshes, and Gaussian splats * MESH EDITION - [[https://twitter.com/_akhaliq/status/1735521389619056726][SHAP-EDITOR]]: Instruction-guided Latent 3D Editing in Seconds - 3D editing directly by a feed-forward network, one second per edit - [[https://twitter.com/_akhaliq/status/1737683046499782968][Repaint123]]: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting - visibility-aware adaptive repainting (view-consistent) - [[id:303a8796-8fc8-4c2f-92f6-62516c8a6ea1][IMAGE SCULPTING]] ==best== - [[id:55829fe3-d777-4723-8b48-5c9454822b5e][STABLEIDENTITY]] inserting identity - [[https://twitter.com/rsasaki0109/status/1757694417757208628][threefiner]]: interface for text-guided mesh refinement ** INPAINTING - [[https://arxiv.org/abs/2403.12470][SC-Diff]]: 3D Shape Completion with Latent Diffusion Models - realistic shape completions at superior resolutions ** MESH TO MESH :PROPERTIES: :ID: 9307c803-21ff-47bf-bdc1-15ea79d2444f :END: - [[https://twitter.com/_akhaliq/status/1769939012221886972][Generic 3D]] Diffusion Adapter Using Controlled Multi-View Editing - MVEdit as 3D counterpart of SDEdit - 3D Adapter lifts 2D; also for texture generation ** MOUSE EDITING - [[https://twitter.com/_akhaliq/status/1765243817949626819][MagicClay]]: Sculpting Meshes With Generative Neural Fields - maintains s consistency between both a mesh and a Signed Distance Field (SDF) representations *** DRAGTEX :PROPERTIES: :ID: dc7fd8d1-1461-4882-a671-a5935e3d15b5 :END: - [[https://arxiv.org/abs/2403.02217][DragTex]]: Generative Point-Based Texture Editing on 3D Mesh - draggint the texture - diffusion model to blend locally inconsistent textures between different views - enabling locally consistent texture editing * MULTIVIEW DIFFUSION :PROPERTIES: :ID: 505848e8-02a5-4699-be28-6e7b2e91837c :END: - [[FLEXIDREAMER]] - [[https://twitter.com/_akhaliq/status/1676070417922916352][MVDiffusion]]: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion, panorama - [[https://mvdiffusion-plusplus.github.io/][MVDiffusion++]]: A Dense High-resolution Multi-view Diffusion Model for Single to Sparse-view 3D Object Reconstruction - 2D latent features learns 3D consistency - [[https://github.com/SUDO-AI-3D/zero123plus][Zero123++]]: a Single Image to Consistent Multi-view Diffusion - [[https://twitter.com/_akhaliq/status/1718830926199693413][ZeroNVS]]: Zero-Shot 360-Degree View Synthesis from a Single Real Image - single-image novel view synthesis; multi-object scenes with complex backgrounds, indoor and outdoor - camera conditioning parameterization and normalization scheme - [[id:080ee2fd-d3ac-400f-a5f5-f97178f98782][LIFTING]] [[ZERO-1-TO-3]] - [[https://twitter.com/_akhaliq/status/1641241817269108736][HOLODIFFUSION]]: Training a 3D Diffusion Model using 2D Images - [[VIEWDIFF]] ** FROM VIDEO - IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation - using video-diffusion instead, reduces the number of evaluations of the 2D generator network 10-100x - [[https://twitter.com/_akhaliq/status/1767389571195470246][V3D]]: [[https://twitter.com/Gradio/status/1769706587759788384][Video]] Diffusion Models are Effective 3D Generators - given a single image ** FROM SD - [[https://twitter.com/_akhaliq/status/1719228482968858833][CustomNet]]: [[https://jiangyzy.github.io/CustomNet/][Zero-shot]] Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models - start from sd checkpoints - 3D novel view synthesis into object customization process ==best== - with flexible background control - [[https://twitter.com/_akhaliq/status/1732609017321521328][DreamComposer]]: Controllable 3D Object Generation via Multi-View Conditions ==best== - view-aware 3D lifting module to obtain 3D representations which are injected into 2d-model - [[3D ADDED TO 2D PRIOR]] *** MVDREAM - [[https://twitter.com/YangZuoshi/status/1697511835547939009][MVDream]]: [[https://twitter.com/victormustar/status/1709172492831477793][Multi-view]] [[https://twitter.com/_akhaliq/status/1697521847963619462][Diffusion]] for 3D Generation ==best one== - generate nerf without janus, from normal 2d diffusion - sd + multi-view dataset rendered from 3D assets - 2D diffusion + consistency of 3D data - [[https://twitter.com/_akhaliq/status/1755782988783083913][SPAD]]: [[https://yashkant.github.io/spad/][Spatially]] Aware Multiview Diffusers - utilize Plucker coordinates derived from camera rays and inject them as positional encoding - janus issue resolved - leverage the multi-view Score Distillation Sampling (SDS) for 3D asset generation ** NOVEL VIEW :PROPERTIES: :ID: 3a9a6a52-b3a6-4a69-b402-531b3b1e2d91 :END: - [[https://nvlabs.github.io/genvs/][GeNVS]]: Generative Novel View Synthesis with 3D-Aware Diffusion Models (1 image to 3d video) - [[https://twitter.com/_akhaliq/status/1674079121427554309][MVDiffusion]]: [[https://twitter.com/YasutakaFuruka1/status/1674083798689157120][Enabling]] [[https://mvdiffusion.github.io/][Holistic]] Multi-view Image Generation with Correspondence-Aware Diffusion - generate new consistent perspective view - [[https://twitter.com/_akhaliq/status/1700007835508068491][SyncDreamer]]: Generating Multiview-consistent Images from a Single-view Image - multiview diffusion model that models the joint probability distribution of multiview images - 3D-aware feature attention - [[https://twitter.com/_akhaliq/status/1738037495332217266][Carve3D]]: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning - Multi-view Reconstruction Consistency (MRC) metric - [[id:2af8a345-7338-42f0-8dda-03b2eb69f22e][POSE - POSITION]] *** GAUSSIAN NOVEL VIEW :PROPERTIES: :ID: 1b77b842-8ebe-4685-8cc9-0f7b9a47fe9a :END: - [[https://donydchen.github.io/mvsplat/][MVSplat]]: Efficient 3D Gaussian Splatting from Sparse Multi-View Images - via plane sweeping in the 3D space, geometric cues estimating depth, 10× fewer 2× faster - [[id:cf480119-9ee7-41c5-8b55-4f51303baaad][SNAP-IT]] *** DREAMDISTRIBUTION :PROPERTIES: :ID: 5762b4c1-e574-4ca5-9e38-032071698637 :END: - [[https://twitter.com/_akhaliq/status/1739508649939157308][DreamDistribution]]: Prompt Distribution Learning for Text-to-Image Diffusion Models ==best== - finds a prompt distribution of reference images, then use it to generates new 2D-3D instances *** 3D-AWARE IMAGE EDITING :PROPERTIES: :ID: b4052ea2-df86-4c37-91b0-e2c2448ab08c :END: - [[https://three-bee.github.io/triplane_edit/][Reference-Based]] 3D-Aware Image Editing with Triplane - Leveraging 3D-aware triplanes, our edits are versatile, allowing for rendering from various viewpoints * TEXTURES :PROPERTIES: :ID: ac3e4d94-22ef-4967-96ac-75cd07b01a91 :END: - [[id:63fb85fb-85cf-469c-be65-726fc0f3ff5d][TEXTURE]] - [[https://twitter.com/_akhaliq/status/1701814103377871037][Learning Disentangled]] Avatars with Hybrid 3D Representations - hair, face, body and clothing can be learned then disentangled yet jointly rendered - [[https://twitter.com/_akhaliq/status/1738024636527395316][Paint]] Anything 3D with Lighting-Less Texture Diffusion Models (Tencent) ==best== - lighting-less textures; image painted over input model, generate texture from text ** SCENE TEXTURES :PROPERTIES: :ID: a4f8fda0-bd4a-42a8-aad0-7a256a696bcd :END: - [[id:802df88e-d2f7-4849-9def-43190e1cebde][SCENE SYNTHESIS]] - [[https://twitter.com/_akhaliq/status/1638380868526899202][Text2Room]]: [[https://lukashoel.github.io/text-to-room/][Extracting]] Textured 3D Meshes from 2D Text-to-Image Models (textures) - [[https://twitter.com/_akhaliq/status/1716308854428860662][DreamSpace]]: Dreaming Your Room Space with Text-Driven Panoramic Texture Propagation (vr) - coarse-to-fine panoramic texture generation which both considers geometry and texture cues - imagine a panoramic then propagate it with inpainting ** MATERIAL - [[https://twitter.com/_akhaliq/status/1699329156020830690][ControlMat]]: A Controlled Generative Approach to Material Capture; diffusion - uncontrolled illumination as input; get variety of materials which could correspond to the input image - [[https://twitter.com/_akhaliq/status/1716832344949358820][TexFusion]]: Synthesizing 3D Textures with Text-Guided Image Diffusion Models - synthesize textures for given 3D geometries - aggregate the different denoising predictions on a shared latent texture map - [[https://twitter.com/_akhaliq/status/1732231627156103374][Alchemist]]: Parametric Control of Material Properties with Diffusion Models - control material attributes of objects like roughness, metallic, albedo, and transparency in real images - edit material properties in real-world images while preserving all other attributes - [[https://lemmy.dbzer0.com/post/12588882][TextureDreamer]]: Image-guided Texture Synthesis through Geometry-aware Diffusion - relightable textures from a small number of input images to target 3D shapes - [[https://twitter.com/_akhaliq/status/1749674409739026627][UltrAvatar]]: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures - removes lighting effects to then rendered under various lighting conditions - [[https://twitter.com/dreamingtulpa/status/1757388865340338347][Holo-Gen]]: by Unity, generate physically-based rendering (PBR) material properties for 3D objects - [[https://twitter.com/_akhaliq/status/1760171462999048366][FlashTex]]: Fast Relightable Mesh Texturing with LightControlNet - texturing an input 3D mesh with prompt, high-quality and relightable textures - based on the ControlNet architecture - [[https://arxiv.org/pdf/2311.05464.pdf][3DStyle-Diffusion]]: [[https://github.com/yanghb22-fdu/3DStyle-Diffusion-Official][Pursuing Fine-grained]] Text-driven 3D Stylization with 2D Diffusion Models ==best== - given input mesh, use 2d diffusion to generate coherent and relightable textures exploiting depth maps - [[MESH TO MESH]] - [[https://synctweedies.github.io/][SyncTweedies]]: A General Generative Framework Based on Synchronized Diffusions - denoising in multiple instance spaces - 3D Mesh Texturing, gaussian splat texturing, depth to panorama - [[https://arxiv.org/pdf/2404.02899.pdf][MatAtlas]]: Text-driven Consistent Geometry Texturing and Material Assignment - sd as a prior to texture a 3D model *** FURTHER ENHANCEMENT - [[https://arxiv.org/abs/2403.05102][Enhancing Texture]] Generation with High-Fidelity Using Advanced Texture Priors - rough texture as the initial input, texture enhancement, eliminating noise and ”point gaps” *** HAIR - [[https://twitter.com/_akhaliq/status/1682429544416813070][Neural]] Haircut: Prior-Guided Strand-Based Hair Reconstruction, realism hair modeling, personalization - [[https://twitter.com/_akhaliq/status/1686415895277588484][CT2Hair]]: High-Fidelity 3D Hair Modeling using Computed Tomography - real-world hair wigs as input, populates dense strands - [[https://twitter.com/_akhaliq/status/1737324311877214228][HAAR]]: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles - latent diffusion model that operates in a common hairstyle UV space *** CLOTH :PROPERTIES: :ID: 7ed28066-314a-44f4-94d3-d1dc73aeb3df :END: - parent: [[id:82127d6a-b3bb-40bf-a912-51fa5134dacc][diffusion]] - [[id:778f3c47-0420-41be-b5bb-f4d4c7f23cb9][DANCING]] - [[https://arxiv.org/pdf/2305.11870][Chupa]]: Carving 3D Clothed Humans from Skinned Shape Priors using 2D Diffusion Probabilistic Models - clothes mesh diffusion using text generation - [[https://twitter.com/_akhaliq/status/1669578320650838016][TryOnDiffusion]]: A Tale of Two UNets (dress one with clothes from another, pose-body change) - [[https://github.com/johannakarras/DreamPose][DreamPose]]: Fashion Image-to-Video Synthesis via Stable Diffusion - [[https://arxiv.org/abs/2402.01516][X-MDPT]]: Cross-view Masked Diffusion Transformers for Person Image Synthesis - change the pose keep the clothes, employs masked diffusion transformers on latent patches - [[https://twitter.com/xiaohuggg/status/1759876272505942462][OOTDiffusion]]: [[https://github.com/levihsu/OOTDiffusion][A highly]] controllable open source tool for virtual clothing try-on ==best one== - [[https://github.com/snuvclab/GALA][GALA]]: Generating Animatable Layered Assets from a Single Scan - from one image get all the clothes in independant separated segmentated pieces-parts - [[https://arxiv.org/abs/2401.16465][DressCode]]: Autoregressively Sewing and Generating Garments from Text Guidance - generate sewing patterns with text guidance, also for editing **** GEN MESH - [[https://twitter.com/_akhaliq/status/1773188761959927861][Garment3DGen]]: 3D Garment Stylization and Texture Generation - synthesize 3D garment assets from a base mesh given a single input image as guidance - [[https://jiali-zheng.github.io/Design2Cloth/][Design2Cloth]]: 3D Cloth Generation from 2D Masks - synthesize the decomposed 3d meshes from single image ** FACE :PROPERTIES: :ID: 0d5c12dd-2d71-436c-a058-cef42d0f1e20 :END: - [[id:ea70f586-4414-4af8-8328-a3d7e891d7f2][GAUSSIAN FACE]] [[id:5a81561e-9e1c-4fc8-bfba-de467b4de033][NERF FACE]] - [[id:4ee8ad27-c3eb-43db-9086-d689ad44b2c6][HEAD POSE]] face-style swap, [[id:ffcb8293-fcbe-4531-8d9e-87cbe68fd4b5][GET HEAD POSE]] - [[https://arxiv.org/pdf/2305.04469.pdf][HACK]]: Learning a Parametric Head and Neck Model for High-fidelity Animation - [[https://arxiv.org/pdf/2305.07514.pdf][BlendFields]]: Few-Shot Example-Driven Facial Modeling *** AVATAR :PROPERTIES: :ID: 38d229fc-ae98-4e26-be4c-99f466a45971 :END: - [[https://twitter.com/_akhaliq/status/1692068192162820212][TeCH]]: Text-guided Reconstruction of Lifelike Clothed Humans - diffusion model, imagines it, outputs geometry and texture - [[https://twitter.com/_akhaliq/status/1691930254611497206][Relightable]] and Animatable Neural Avatar from Sparse-View Video - relightable neural avatars from monocular inputs; pose AND surface intersection, light visibility - [[https://seeavatar3d.github.io/][SEEAvatar]]: Photorealistic Text-to-3D Avatar Generation with Constrained Geometry and Appearance - geometry is constrained to human-body prior; high-quality meshes and textures - [[https://huggingface.co/papers/2305.19245][AlteredAvatar]]: Stylizing Dynamic 3D Avatars with Fast Style Adaptation - [[https://huggingface.co/papers/2312.11461][GAvatar]]: Animatable 3D Gaussian Avatars with Implicit Mesh Learning - SDF-based implicit mesh learning; primitive-based 3D Gaussian representation to facilitate animation - [[https://arxiv.org/abs/2401.00711][Text2Avatar]]: Text to 3D Human Avatar Generation with Codebook-Driven Body Controllable Attribute - intermediate codebook features - [[https://arxiv.org/abs/2401.15348][AniDress]]: Animatable Loose-Dressed Avatar from Sparse Views Using Garment Rigging Model - generating animatable human avatars in loose clothes from sparse multi-view videos - [[https://arxiv.org/abs/2402.02369][M3Face]]: A Unified Multi-Modal Multilingual Framework for Human Face Generation and Editing - multi-modal framework for face generation and editing - [[https://arxiv.org/abs/2402.11909][One2Avatar]]: Generative Implicit Head Avatar For Few-shot User Adaptation ==best== - 3D animatable photo-realistic head avatar as prior, personalizable with few shot (one image) - [[id:a8c7edef-f00a-42ad-9511-086c23948c86][MAGICMIRROR]]: [[https://syntec-research.github.io/MagicMirror/][style]] and subject transformations ==best== **** EXPRESSION - [[https://real3dportrait.github.io/][Real3D-Portrait]]: One-shot Realistic 3D Talking Portrait Synthesis - to generate a talking portrait video, estimating expression and head pose - [[https://sites.google.com/view/media2face][Media2Face]]: Co-speech Facial Animation Generation With Multi-Modality Guidance - annotated emotional and style labels, auto-encoder decoupling expressions and identities - [[https://arxiv.org/abs/2402.13724][Bring Your]] Own Character: A Holistic Solution for Automatic Facial Animation Generation of Customized Characters ==best== - automatically animate virtual human faces ***** AUDI TO VIDEO - [[https://twitter.com/_akhaliq/status/1762686465777999932][EMO]]: Emote Portrait Alive ==best== - audio to video (directly) - [[https://twitter.com/_akhaliq/status/1772926152698396709][AniPortrait]]: Audio-Driven Synthesis of Photorealistic Portrait Animation - driven by audio and a reference portrait image *** FACE DIFFUSION - parent: [[id:82127d6a-b3bb-40bf-a912-51fa5134dacc][diffusion]] - [[https://arxiv.org/abs/2305.06077][Relightify]]: Relightable 3D Faces from a Single Image via Diffusion Models - create from photo a set of textures(BRDF) to generate realistic 3d face - [[https://arxiv.org/abs/2402.07370][SelfSwapper]]: Self-Supervised Face Swapping via Shape Agnostic Masked AutoEncoder - mitigates identity leakage by masking facial regions and utilizing disentangled identity and non-identity features * MESH DIFFUSION - [[id:82127d6a-b3bb-40bf-a912-51fa5134dacc][diffusion]] [[id:e05c8e78-1579-42a1-8202-30c58b96e504][GAUSSIAN]] - [[https://github.com/lzzcd001/MeshDiffusion/][MeshDiffusion]] [[https://meshdiffusion.github.io/][paper]]: 3D Mesh Modeling - [[https://huggingface.co/papers/2305.10853][LDM3D]]: [[https://arxiv.org/pdf/2305.10853.pdfhttps://arxiv.org/pdf/2305.10853.pdf][Latent Diffusion]] Model for 3D, normal stable diffusion, with depth map generation, ==best== - [[id:d9dfa74e-0833-46af-8b79-1135cd38e21e][LUCIDDREAMER]] ** TEXT-TO-3D - [[https://twitter.com/_akhaliq/status/1719230080809828378][Text-to-3D]] with classifier score distillation ==best== - guidance alone is enough for effective text-to-3D generation - [[https://twitter.com/_akhaliq/status/1732225883274899777][StableDreamer]]: Taming Noisy Score Distillation Sampling for Text-to-3D ==best quality== (Gaussians) - image-space diffusion for geometric precision, latent-space diffusion for vivid color - high-fidelity(quality) 3D models - [[https://twitter.com/_akhaliq/status/1734052737229185041][Text-to-3D]] Generation with Bidirectional Diffusion using both 2D and 3D priors ==best== - both a 3D and a 2D diffusion process (priors), bidirectional guidance (20 minutes) - [[https://twitter.com/_akhaliq/status/1735518720125014489][UniDream]]: Unifying Diffusion Priors for Relightable Text-to-3D Generation - albedo-normal aligned multi-view diffusion (to enable relighting) - [[https://arxiv.org/abs/2312.11417][PolyDiff]]: Generating 3D Polygonal Meshes with Diffusion Models - operates natively on the polygonal mesh, trained to restore the original mesh structure - [[LRM]] - [[https://aigc3d.github.io/richdreamer/][RichDreamer]]: [[https://github.com/modelscope/RichDreamer][A Generalizable]] [[https://github.com/modelscope/RichDreamer][Normal-Depth]] Diffusion Model for Detail Richness in Text-to-3D - trained with extra image-to-depth and image-normal priors; maps diffused together - [[https://twitter.com/_akhaliq/status/1747501047705194655][HexaGen3D]]: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation (7 seconds) - [[https://snap-research.github.io/AToM/][AToM]]: Amortized Text-to-Mesh using 2D Diffusion - high-quality textured meshes, 1 second inference, 10 times reduction in training cost, unseen prompts - L3GO: [[https://twitter.com/_akhaliq/status/1757960242090946794][Language Agents]] with Chain-of-3D-Thoughts for Generating Unconventional Objects - compose a desired object via trial-and-error within the 3D simulation environment - [[https://twitter.com/_akhaliq/status/1766120260220948715][Instant Text-to-3D]] Mesh with PeRFlow-T2I + TripoSR *** VOLUME DIFFUSION :PROPERTIES: :ID: b95dc13d-0509-4f97-9597-34393c075799 :END: - [[https://twitter.com/_akhaliq/status/1727756509142986828][WildFusion]]: Learning 3D-Aware Latent Diffusion Models in View Space - trained without direct supervision from multiview or 3D and dont require pose or camera distributions - autoencoder captures the images underlying 3D structure (unlike previous gan approaches) - [[https://arxiv.org/abs/2312.11459][VolumeDiffusion]]: [[https://github.com/tzco/VolumeDiffusion][Flexible]] Text-to-3D Generation with Efficient Volumetric Encoder - 3D latent representation, seconds to minutes ** CHARACTER - [[https://avatarverse3d.github.io/][AvatarVerse]]: High-quality & Stable 3D Avatar Creation from Text and Pose - text descriptions and pose guidance - uses mlp nerf - [[https://twitter.com/xinhuang_/status/1709484346258997379][HumanNorm]]: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation - Fast Registration of Photorealistic Avatars for VR Facial Animation - Synthesizing Moving People with 3D Control ** FROM 2D DIFFUSION - [[id:505848e8-02a5-4699-be28-6e7b2e91837c][MULTIVIEW DIFFUSION]] - [[https://github.com/KU-CVLAB/3DFuse][Let 2D Diffusion]] Model Know 3D-Consistency for Robust Text-to-3D Generation - [[https://jeffreyxiang.github.io/ivid/][3D-aware]] [[https://arxiv.org/abs/2303.17905][Image Generation]] using 2D Diffusion Models - [[https://twitter.com/_akhaliq/status/1715242827150393433][Enhancing High-Resolution]] 3D Generation through Pixel-wise Gradient Clipping - integration into existing 3D generative models, enhancing synthesis of the texture - for image generation Latent Diffusion Model (LDM) - [[id:d1d1a9ff-670e-4bed-9087-ad0b8b71ee7a][CONTROLNET FOR 3D]] ==best== - [[HYPERDREAMER]] [[id:277f7cda-963c-48ff-8e43-169986d8cff6][DEPTH DIFFUSION]] [[MESH TO MESH]] *** 3D PRIOR - [[https://github.com/baaivision/GeoDream/][GeoDream]]: [[https://twitter.com/thibaudz/status/1731703406354432394/photo/1][Disentangling]] 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation - multi-view diffusion serves as native 3D geometric priors - disentangling 2D and 3D priors allows us to refine 3D geometric priors further - [[https://twitter.com/_akhaliq/status/1740581737422368914][DiffusionGAN3D]]: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors - diffusion guides the 3D generator finetuning with informative direction - [[https://twitter.com/_akhaliq/status/1767031482302816411][CRM]]: [[https://twitter.com/dreamingtulpa/status/1768930432072954229][Single]] Image to 3D Textured Mesh with Convolutional Reconstruction Model - Convolutional Reconstruction Model (CRM), feed-forward single image-to-3D generative - triplane exhibits spatial correspondence of six orthographic images - [[https://stellarcheng.github.io/Sculpt3D/][Sculpt3D]]: [[https://github.com/StellarCheng/Scuplt_3d/tree/main][Multi-View]] Consistent Text-to-3D Generation with Sparse 3D Prior - modulate the output of the 2D diffusion model to the correct patterns of the template views - [[id:58953d7d-4769-4476-aa72-ced6f6f4d95b][ISOTROPIC3D]] - [[https://twitter.com/_akhaliq/status/1770658449094754752][Compress3D]]: a Compressed Latent Space for 3D Generation from a Single Image - encodes 3D models into a compact triplane latent space; 7 seconds diffusion **** 3D ADDED TO 2D PRIOR - [[https://ku-cvlab.github.io/RetDream/][Retrieval-Augmented]] Score Distillation for Text-to-3D Generation ==best== - adapt the diffusion model's 2D prior toward view consistency - added controllability and negligible training cost **** VIEWDIFF - [[https://lemmy.dbzer0.com/post/15700946][ViewDiff]]: 3D-Consistent Image Generation with Text-to-Image Models - autoregressive 3D-consistent images at any viewpoint - integrate 3D volume-rendering and cross-frame-attention layers into each block of sd - *the prior could instead be an animation prior?* since puts attention on the batch - 3d consistent space *** FAST - [[https://twitter.com/_akhaliq/status/1723911920552456548][Instant3D]]: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model ==best, fast== - generates 4 views and then regresses a NeRF from them - [[https://twitter.com/_akhaliq/status/1724640810858127815][Instant3D]]: Instant Text-to-3D Generation - [[https://twitter.com/_akhaliq/status/1724639905261781017][One-2-3-45++]]: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion - multi-view image generation, then to 3D using multi-view 3D native diffusion models - [[https://twitter.com/_akhaliq/status/1726427966878441874][MetaDreamer]]: Efficient Text-to-3D Creation With Disentangling Geometry and Texture (2 stages, within 20 minutes) * MESH NERF - parent: [[id:f5d2ef09-1412-4955-a3c5-c22f6fff8d11][nerf]] - [[https://autonomousvision.github.io/sdfstudio/][A Unified]] [[https://github.com/autonomousvision/sdfstudio][Framework]] for Surface Reconstruction, 3d from nerf - Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures; [[https://github.com/eladrich/latent-nerf][latent nerf]] - [[https://twitter.com/_akhaliq/status/1678946803343925248][AutoDecoding]] Latent 3D Diffusion Models, view-consistent appearance and geometry - [[https://twitter.com/weiyuli99072112/status/1711407222272774191][SweetDreamer]] [[https://sweetdreamer3d.github.io/][converts]] 2D drawings from certain models into 3D by fixing geometry-related issues - [[https://twitter.com/_akhaliq/status/1727197637617557891][SuGaR]]: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering - regularization term that encourages the gaussians to align well with the surface of the scene - and binds them which enables easy editing, sculpting, rigging, animating, compositing and relighting - NERF [[id:2a24e0fc-aa45-416c-bcce-7ecf55409e88][FROM TEXT]] [[id:d9dfa74e-0833-46af-8b79-1135cd38e21e][LUCIDDREAMER]]