parent: computer_vision
ArtLine gan to get artline from image, maybe instead of canny for controlnet?
Emergenceof Segmentation with Minimalistic White-Box Transformers
BoundaryAttention: Learning to Find Faint Boundaries at Any Resolution ==best==
infers including contours, corners and junctions
MST Adaptive Multi-Scale Tokens Guided Interactive Segmentation
leveraging token similarity to allow for fewer tokens to be used, maintaining multi-scale token interaction
Materialistic Selecting Similar Materials in Images
BackgroundPrompting for Improved Object Depth
learned background prompt, thus focuses in the object
LISA ReasoningSegmentation via Large Language Model
Language Instructed Segmentation Assistant, speak to it and it segments
SegGPT Segmenting Everything In Context
Painter& SegGPT Series: Vision Foundation Models from BAAI (radiography components, top of box)
Grounding Everything Emerging Localization Properties in Vision-Language Transformers
clip can perform zero-shot open-vocabulary segmentation; probability-like experiance
CartoonSegmentation Instance-guided Cartoon Editing with a Large-scale Dataset (anime fine details) ==best==
TrackingAny Object Amodally
comprehend complete objects from partial visibility; boxes for occluded objects
CutLer object detection and segmentator
Detecting censorswith deep learning and computer vision; location (to later inpaint over them)
U2Seg Unsupervised Universal Image Segmentation (vs CutLER) ==best==
clustering of seudo semantic labels
3DiffTection 3D Object Detection with Geometry-Aware Diffusion Features
finetune(controlnet) 2d diffusion to perform novel view synthesis from a single image (using epipolar warp operator) ==best==
3D detection and identifying cross-view point correspondences
NeRF-Det Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection
indoor 3D detection(and depth) with images as input; unseen scenes, without requiring per-scene optimization
EmerNeRF Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
captures scene geometry, appearance, motion, represent highly-dynamic scenes self-sufficiently
multi-granularity segmentation, instantaneous(unlike SA3D)
GARField Group Anything with Radiance Fields
use sam 2D masks, coarse-to-fine hierarchy
SAM + DINO segmentanything, image region editing
RecognizeAnything: A Strong Image Tagging Model
https://arxiv.org/abs/2304.06718 Segment-Everything-Everywhere-All-At-Once
Semantic-SAM Segmentand Recognize Anything at Any Granularity
generate masks at multiple levels
Open-VocabularySAM: Segment and Recognize Twenty-thousand Classes Interactively
CLIP-like real-world recognition
Learningto Prompt Segment Anything Models
optimizing the prompts using few shot data
Fast SegmentAnything, 40ms perimage PyPI
EfficientSAM 20x fewer parameters and 20x faster runtime
SlimSam 0.1% Data Makes Segment Anything Slim
0.9%(5.7M) parameters, 0.1% data
TinySAM Pushing the Envelope for Efficient Segment Anything Model
knowledge distillation to distill a lightweight student model
segment videos https://github.com/gaomingqi/Track-Anything
Tracking Anythingwith Decoupled Video Segmentation
VideoInstance Matting
estimating each instance at each frame of a video sequence
UniRef++ Segment Every Reference Object in Spatial and Temporal Spaces
unify four reference-based object segmentation tasks with a single architecture (box, area from prompt)
Lester rotoscope animation through video object segmentation and tracking
mask and track across frames
MattingAnything Model (MAM): green screen-ed
TOKENCOMPOSEenhanced prompting
RelateAnything see relationships between them
Osprey Pixel Understanding with Visual Instruction Tuning Understand everything for SAM
click on and get description of cluster of pixels
SegmentAnything Meets Point Tracking, follow pixels, OPTICAL FLOW
DreamTeacher Pretraining Image Backbones with Deep Generative Models
following 3d concepts with 3d understanding
parent: stable_diffusion
Diffusion Modelsas MaskedAutoencoders
ODISE Open-Vocabulary PanopticSegmentation with Text-to-Image Diffusion Models.
Diffusion Modelsfor Zero-Shot Open-Vocabulary Segmentation (considers the contextual background)
MosaicFusion Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
generate synthetic labeled data, for rare and novel categories to then teach segmentation
FIND Interface Foundation Models' Embeddings
segment and correlate to prompt token
SegRefiner Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process ==best==
augment the segmentation accuracy by denoising it (exceedingly fine details)
EmerDiff Emerging Pixel-level Semantic Knowledge in Diffusion Models
identifies correspondences between pixels and latent space features
FreeSeg-Diff Training-Free Open-Vocabulary Segmentation with Diffusion Models
through a diffusion model and an image captioner model
both frozen
3DiffTection 3D Object Detection with Geometry-Aware Diffusion Features
synthesis conditioned on a single image using epipolar warp operator
3D-aware features for 3D detection identifying cross-view point correspondences
AudioSep Separate Anything You Describe, Separate Anything Audio Model
SegmentAnything in 3D with NeRFs (SA3D)
SAM3D Zero-Shot 3D Object Detection via Segment Anything Model
SAD isableto perform 3D segmentation (segment out any 3D object) with RGBD inputs
VoxelNeXt Fully Sparse VoxelNet for 3D Object Detection and Tracking (convnext)
predict objects directly upon sparse voxel features
no sparse-to-dense conversion, anchors, or center proxies needed anymore
use: 2D segmentation mask into 3D boxes: code
EgoLifter Open-world 3D Segmentation for Egocentric Perception
segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects
iSeg Interactive 3D Segmentation via Interactive Attention
based on clicking, positive and negative clicks directly on the shape's surface
into point cloud:
SuperPrimitive Scene Reconstruction at a Primitive Level
splitting images into semantically correlated local regions, then enhancing with normals
for tasks: depth completion(per pixel), few-view structure from motion, and monocular dense visual odometry(get pov angles)
LangSplat 3D Language Gaussian Splatting
ground CLIP features into 3D language Gaussians, faster than LERF
SA-GS: Segment Anythingin 3D Gaussians
without any training process and learned parameters
RAFT RecurrentAll-Pairs Field Transforms for Optical Flow (video optical flow)
OmniMotion Tracking Everything Everywhere All at Once (following pixels, optical flow)
INVE Interactive Neural Video Editing; painting pixels, then following them
TrackingAnything in High Quality
pretrained MR model is employed to refine the tracking result
CoTracker models correlation of the points in time, using attention
cantrack every pixel or selected
generaterainbow visualizations from a set of point tracks
SpatialTracker Tracking Any 2D Pixels in 3D Space
dealing with occlusions and discontinuities in 2d, mitigate the issues caused by image projection
using monocular depth estimators
parent: diffusion
The SurprisingEffectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation
ECLIPSE Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning
freezing model parameters, fine-tuning a small set of prompt embeddings
addressing both catastrophic forgetting and plasticity
significantly reducing the trainable parameters