📓 nodes/20230628204724-segmentation.org by @tekakutli-org ☆

parent: computer_vision
ArtLine gan to get artline from image, maybe instead of canny for controlnet?
Emergenceof Segmentation with Minimalistic White-Box Transformers
BoundaryAttention: Learning to Find Faint Boundaries at Any Resolution ==best==
- infers including contours, corners and junctions
MST Adaptive Multi-Scale Tokens Guided Interactive Segmentation
- leveraging token similarity to allow for fewer tokens to be used, maintaining multi-scale token interaction

TARGET-ING

Materialistic Selecting Similar Materials in Images
BackgroundPrompting for Improved Object Depth
- learned background prompt, thus focuses in the object
LISA ReasoningSegmentation via Large Language Model
- Language Instructed Segmentation Assistant, speak to it and it segments
SegGPT Segmenting Everything In Context
- Painter& SegGPT Series: Vision Foundation Models from BAAI (radiography components, top of box)
Grounding Everything Emerging Localization Properties in Vision-Language Transformers
- clip can perform zero-shot open-vocabulary segmentation; probability-like experiance
CartoonSegmentation Instance-guided Cartoon Editing with a Large-scale Dataset (anime fine details) ==best==

OBJECT DETECTION

TrackingAny Object Amodally
- comprehend complete objects from partial visibility; boxes for occluded objects

CutLer object detection and segmentator
- Detecting censorswith deep learning and computer vision; location (to later inpaint over them)
U2Seg Unsupervised Universal Image Segmentation (vs CutLER) ==best==
- clustering of seudo semantic labels

3DiffTection 3D Object Detection with Geometry-Aware Diffusion Features
- finetune(controlnet) 2d diffusion to perform novel view synthesis from a single image (using epipolar warp operator) ==best==
- 3D detection and identifying cross-view point correspondences

NeRF-Det Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection
- indoor 3D detection(and depth) with images as input; unseen scenes, without requiring per-scene optimization
EmerNeRF Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
- captures scene geometry, appearance, motion, represent highly-dynamic scenes self-sufficiently
SAGA SegmentAny 3D Gaussians
- multi-granularity segmentation, instantaneous(unlike SA3D)
GARField Group Anything with Radiance Fields
- use sam 2D masks, coarse-to-fine hierarchy

SAM + DINO segmentanything, image region editing
high quality sam
- RecognizeAnything: A Strong Image Tagging Model
https://arxiv.org/abs/2304.06718 Segment-Everything-Everywhere-All-At-Once
inpainting
Semantic-SAM Segmentand Recognize Anything at Any Granularity
- generate masks at multiple levels
Open-VocabularySAM: Segment and Recognize Twenty-thousand Classes Interactively
- CLIP-like real-world recognition
Learningto Prompt Segment Anything Models
- optimizing the prompts using few shot data

Fast SegmentAnything, 40ms perimage PyPI
EfficientSAM 20x fewer parameters and 20x faster runtime
SlimSam 0.1% Data Makes Segment Anything Slim
- 0.9%(5.7M) parameters, 0.1% data
TinySAM Pushing the Envelope for Efficient Segment Anything Model
- knowledge distillation to distill a lightweight student model

segment videos https://github.com/gaomingqi/Track-Anything
Tracking Anythingwith Decoupled Video Segmentation
VideoInstance Matting
- estimating each instance at each frame of a video sequence
UniRef++ Segment Every Reference Object in Spatial and Temporal Spaces
- unify four reference-based object segmentation tasks with a single architecture (box, area from prompt)
Lester rotoscope animation through video object segmentation and tracking
- mask and track across frames

RelateAnything see relationships between them
Osprey Pixel Understanding with Visual Instruction Tuning Understand everything for SAM
- click on and get description of cluster of pixels

SegmentAnything Meets Point Tracking, follow pixels, OPTICAL FLOW
DreamTeacher Pretraining Image Backbones with Deep Generative Models
- following 3d concepts with 3d understanding

parent: stable_diffusion
SLIME
Diffusion Models as MaskedAutoencoders
ODISE Open-Vocabulary PanopticSegmentation with Text-to-Image Diffusion Models.
Diffusion Modelsfor Zero-Shot Open-Vocabulary Segmentation (considers the contextual background)
MosaicFusion Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation
- generate synthetic labeled data, for rare and novel categories to then teach segmentation
FIND Interface Foundation Models' Embeddings
- segment and correlate to prompt token
SegRefiner Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process ==best==
- augment the segmentation accuracy by denoising it (exceedingly fine details)
EmerDiff Emerging Pixel-level Semantic Knowledge in Diffusion Models
- identifies correspondences between pixels and latent space features
FreeSeg-Diff Training-Free Open-Vocabulary Segmentation with Diffusion Models
- through a diffusion model and an image captioner model
  - both frozen

3DiffTection 3D Object Detection with Geometry-Aware Diffusion Features
- synthesis conditioned on a single image using epipolar warp operator
- 3D-aware features for 3D detection identifying cross-view point correspondences

3D SD SEG NERF SEGMENTATION LIFT3D
SegmentAnything in 3D with NeRFs (SA3D)
- SAM3D Zero-Shot 3D Object Detection via Segment Anything Model
SAD is ableto perform 3D segmentation (segment out any 3D object) with RGBD inputs
VoxelNeXt Fully Sparse VoxelNet for 3D Object Detection and Tracking (convnext)
- predict objects directly upon sparse voxel features
  - no sparse-to-dense conversion, anchors, or center proxies needed anymore
- use: 2D segmentation mask into 3D boxes: code
EgoLifter Open-world 3D Segmentation for Egocentric Perception
- segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects
iSeg Interactive 3D Segmentation via Interactive Attention
- based on clicking, positive and negative clicks directly on the shape's surface

into point cloud:
- SuperPrimitive Scene Reconstruction at a Primitive Level
  - splitting images into semantically correlated local regions, then enhancing with normals
  - for tasks: depth completion(per pixel), few-view structure from motion, and monocular dense visual odometry(get pov angles)

LangSplat 3D Language Gaussian Splatting
- ground CLIP features into 3D language Gaussians, faster than LERF
SA-GS: Segment Anythingin 3D Gaussians
- without any training process and learned parameters

RAFT RecurrentAll-Pairs Field Transforms for Optical Flow (video optical flow)
- OmniMotion Tracking Everything Everywhere All at Once (following pixels, optical flow)
- INVE Interactive Neural Video Editing; painting pixels, then following them
TrackingAnything in High Quality
- pretrained MR model is employed to refine the tracking result
CoTracker models correlation of the points in time, using attention
- cantrack every pixel or selected
generaterainbow visualizations from a set of point tracks
SpatialTracker Tracking Any 2D Pixels in 3D Space
- dealing with occlusions and discontinuities in 2d, mitigate the issues caused by image projection
  - using monocular depth estimators
FOLLOW AREA

parent: diffusion
The SurprisingEffectiveness of Diffusion Models for Optical Flow and Monocular Depth Estimation

ECLIPSE Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning
- freezing model parameters, fine-tuning a small set of prompt embeddings
  - addressing both catastrophic forgetting and plasticity
    - significantly reducing the trainable parameters