πŸ““ nodes/20230628195140-computer_vision.org by @tekakutli-org β˜†

CUSTOMIZATIOn

MAP AS OUTPUT

LEARNING FROM VIDEO

DOCUMENTS

TOKENIZER

QUERING MODELS - MULTIMODAL

WITH OTHER VISUAL SOMETHING

WEB MOCKING

GROUNDING

2D+ VISION

3D VISION

VIDEO VISION

CLASSIFICATION

CAPTIONING :CLIPREGION:

CAPTIONING VIDEO

REGIONS

IMAGE CLUSTERING

DIFFUSION FEATURES

AUDIO VISION

WHISPER