:PROPERTIES:
:ID:       cb192d74-71e5-40c3-8763-6f68ffde8e27
:END:
#+title: train
#+filetags: :neuralnomicon:
#+SETUPFILE: https://fniessen.github.io/org-html-themes/org/theme-readtheorg.setup
- multiple working-into together:
  - diffusion: [[id:6a66690f-b76f-441a-a093-3c83ca73af2d][MULTIPLE DIFFUSION]], GIT RE-BASIN
  - text: RAD, EFT
  - logistic: Auto-Instruct
- [[https://twitter.com/_akhaliq/status/1678970340033150977][Self-Supervised]] Learning with Lie Symmetries for Partial Differential Equations
  - computationally efficient alternatives to numerical solvers
  - self-supervised learn general-purpose representations of PDEs from heterogeneous data
- [[https://twitter.com/HarperSCarroll/status/1728061267624014219][Q*]]: [[https://www.youtube.com/watch?v=4qkKpNnSrlY&t=9][New Objective]]: Q-Learning and Q* - Decision Making Under Uncertainty (CS238/AA228)
  - Q-learning parallels biological reward neurocircuitry, reinforcement learning (RL)
- [[https://twitter.com/_akhaliq/status/1737716034088386604][Model-Based]] Control with Sparse Neural Dynamics (aggressive sparsification) (distillation)
  - parsify it by removing redundant neurons, applicable to a wide variety of DNNs
- [[https://twitter.com/_akhaliq/status/1749289751058710685][Zero Bubble]] Pipeline Parallelism
  - algorithm for optimal schedule on config and memory limit
- [[https://twitter.com/_akhaliq/status/1767393991727657262][Fuyou]]: Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
  - training with low-end GPU and limited CPU memory capacity
- [[https://twitter.com/_akhaliq/status/1777155790417150111][Direct Nash]] Optimization: Teaching Language Models to Self-Improve with General Preferences
  - post-training llm using preference feedback from a teacher model to iteratively improve over itself
  - marries the simplicity and stability of contrastive learning with the theoretical generality from optimizing general preferences
- [[https://twitter.com/_akhaliq/status/1778592009067925649][From Words]] to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
  - rivaling supervised methods such as Random Forest, Bagging, or Gradient Boosting
* RESEARCH
- [[https://arxiv.org/abs/1606.01981][Deep neural]] networks are robust to weight binarization and other non-linear distortions
  - 0.68 effective bits per weight (below 1 bit models)
    - points to the idea that a stochastic memory element can be used
** MATMUL FREE
:PROPERTIES:
:ID:       b420e2cc-c219-43ef-baa6-e913a4690872
:END:
- [[https://twitter.com/rohanpaul_ai/status/1799122826114330866][Scalable]] [[https://github.com/ridgerchu/matmulfreellm][MatMul-free]] Language Modeling
  - replaces MatMul operations in dense layers with ternary accumulations using weights constrained to {-1, 0, +1}
    - reducing computational cost and memory while preserving network expressiveness
  - gpus? they removed matmults but still use hadamard product (element wise product), which are also embarrassingly parallel and can be GPUs accelerated
    - using flexible GPUs for training and FPGAs-ASIC for inference is the optimal tradeoff for this
* SOFTWARE WISE
- optimizer from 32 bits to 8 bits
  - https://github.com/pyg-team/pytorch_geometric
- faster matrix using approximations
  - https://github.com/dblalock/bolt
* WITH REWARD
- feedback: [[id:ad5a8c1e-10c2-4155-86fe-ecbfa1ffcd07][FEEDBACK AS TARGET]] [[id:59d1d337-eff3-42bb-9398-1e51b0739074][HUMAN FEEDBACK]] [[id:4daacc49-2790-49c2-a32a-880c5f99e681][PROPER-ING INSTRUCTIONS]]
- [[https://arxiv.org/abs/2310.03739][AlignProp]]: Aligning Text-to-Image Diffusion Models with Reward Backpropagation
  - aligns to reward functions
- [[https://twitter.com/_akhaliq/status/1716305579101069478][CPL]]: Contrastive Prefence Learning: Learning from Human Feedback without RL
  - learning optimal policies from preferences without learning reward functions
  - regret-based model of human preferences instead of reward
** CLIP AS REWARD
:PROPERTIES:
:ID:       9bec56a3-a402-418d-bc67-40b3165089c3
:END:
- [[https://twitter.com/_akhaliq/status/1715244883659661790][Vision-Language]] Models are Zero-Shot Reward Models for Reinforcement Learning
  - reward function = often infeasible(not posible), reward model from human feedback = often very expensive
  - VLMs(CLIP) as reward models: a single sentence text prompt describing the desired task
** REINFORCEMENT LEARNING
- [[https://twitter.com/_akhaliq/status/1717390896788873543][TD-MPC2]]: Scalable, Robust World Models for Continuous Control
  - agent to perform 80 tasks across multiple task domains, embodiments, and action spaces
  - performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model
** LLM AS REWARD
:PROPERTIES:
:ID:       2fefb31b-1809-49c0-b925-a7b9a6fa3b0b
:END:
- [[https://twitter.com/arankomatsuzaki/status/1706311844829487153][Text2Reward]]: [[https://text-to-reward.github.io/][Automated]] Dense Reward Function Generation for Reinforcement Learning
  - automates the generation of dense reward functions based on llm
- [[https://twitter.com/_akhaliq/status/1715184868294889490][Eureka]]: [[https://twitter.com/DrJimFan/status/1715397393842401440][Human-Level]] Reward Design via Coding Large Language Models
  - generates reward functions that outperform expert human-engineered rewards
    - so now can acquire complex skills via reinforcement learning, optimization over reward
      - to get sequential decision-making tasks
    - in-context RLHF to incorporate feedback and steer and align the reward function
  - outer loop: inference-only LLM instructs a learnable NN to refine the reward function
    - inner loop: reinforcement learning to train a controller
  - pen spinning
* STRUCTURE
- [[id:e261c214-31a2-4d93-a62b-61d7d53b702c][LORA]]
- [[https://github.com/facebookresearch/ConvNeXt][ConvNeXt]] (vs ViT, for image classification)
  - accurate, efficient, scalable and very simple in design
    - for: zero-shot image classification, image and text retrieval
  - clip convnext: https://huggingface.co/laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft (320 vs 320)
- [[https://twitter.com/_akhaliq/status/1734422847915721136][CNCA]]: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
  - replacing linear recurrence with a special temporal convolutional network
    - permits larger receptive field size with shallower networks
    - reduces the computational complexity to O(L)
- [[https://twitter.com/_akhaliq/status/1741659775673184467][PanGu-π]]: Enhancing Language Model Architectures via Nonlinearity Compensation
  - shortcut used to enhance the model nonlinearity, 10% inference speed-up
  - non linearity usual in convolutional networks for vision tasks
** HYPERPARAMETER
- muP is proposes "right way to scale", effective weight init scheme; searching the optimal hyperparameters
* CLASSIFIER
** GZIP VS GPT
:PROPERTIES:
:ID:       316325a1-f24b-487d-9238-ca35db3a6b0c
:END:
  - are llm just text compression algorithms?
    - [[https://twitter.com/_akhaliq/status/1666644201705029632][LLMZip]]: Lossless Text Compression using Large Language Models
    - gzip instead of parameters for classification
      - “[[https://aclanthology.org/2023.findings-acl.426.pdf][Low-Resource]]” Text Classification: A Parameter-Free Classification Method with Compressors
* SMALLER
** COMPRESSION
- [[https://lemmy.dbzer0.com/post/12260097][Knowledge Translation]]: A New Pathway for Model Compression
  - teacher-student model that receives parameters and generates compressed ones
** QUANTIZATION
- [[id:385118af-d780-4cce-ad5a-46b3ecb11db7][DIFFUSION QUANTIZATION]]
- [[https://arxiv.org/pdf/2303.10512.pdf][AdaLoRA]] adaptively allocates the parameter budget among weight matrices according to their importance (adaptive lora)
- [[https://twitter.com/_akhaliq/status/1688791126080126976][FLIQS]]: One-Shot Mixed-Precision Floating-Point and Integer Quantization Search
  - mixed-precision quantization, eliminates the need for retraining
* OPTIMIZER
- [[https://twitter.com/DrJimFan/status/1625920782332489729][Lion]]: better than Adam, optimizer
- [[https://arxiv.org/abs/2302.03764][Sketchy]]: [[https://twitter.com/FeinbergVlad/status/1623540032832413696][Memory-efficient]] Adaptive Regularization with Frequent Directions
  - Kronecker-factored diagonal eigenvalues, Frequent Directions
* CHEAPNESS
- [[https://huggingface.co/papers/2307.03576][One Step of]] Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
- [[https://twitter.com/_akhaliq/status/1683670100350742528][Optimized]] Network Architectures for Large Language Model Training with Billions of Parameters
  - only small subgroups of GPUs require high-bandwidth any-to-any communication within them
* DATASET
:PROPERTIES:
:ID:       3b228325-e1af-4fc5-857b-fe5933e20b03
:END:
- [[id:aeca80bb-38f3-4343-a214-67e3b4df245e][CAPTIONING]]
- dimensionality reduction algorithms
  - t-SNE and UMAP had long been the favorites
  - "Deep TDA" combines self-supervised learning and Topological Data Analysis (TDA)
    - unlock new insights from complex datasets
    - more robust to noise and outliers in the data
- [[https://twitter.com/_akhaliq/status/1732966139762819390][Gen2Det]]: Generate to Detect
  - directly generating scene-centric images (synthetic)
  - improves the performance on rare categories
- [[https://arxiv.org/abs/2401.04441][Image]] classification network enhancement methods based on knowledge injection
  - knowledge injection dataset to improve interpretability and classification performance of hidden layers
- [[https://twitter.com/_akhaliq/status/1765052609952436422][MovieLLM]]: Enhancing Long Video Understanding with AI-Generated Movies
  - generate a script and correspoinding video as dataset
** MISTAKES
- [[https://twitter.com/_akhaliq/status/1755784642827874416][In-Context]] Principle Learning from Mistakes
  - induce model to make mistakes; then we reflect on these mistakes, and learn explicit task-specific "principles" from them which help solve similar mistakes
** ACTUAL DATASET
- [[https://huggingface.co/datasets/gvecchio/MatSynth][MatSynth]]: Physically Based Rendering (PBR) materials dataset (4,000 ultra-high resolution)
- [[https://browse.arxiv.org/abs/2402.01355][FindingEmo]]: An Image Dataset for Emotion Recognition in the Wild
  - annotated dimensions include: valence, arousal and emotion
- [[https://twitter.com/storytracer/status/1765410706638160303][English public]] domain books
*** HANDS DATASET
:PROPERTIES:
:ID:       3f752b46-cae4-49d9-948d-50e3c500727e
:END:
- [[https://arxiv.org/abs/2401.15075][Annotated Hands]] for Generative Models
  - with three additional channels that provide annotations to hands in the image, additional structure
** ENHANCEMENT
- [[id:f03ccf94-1aa5-4705-89af-617a22570e26][AUDIO VISION]]
- [[https://twitter.com/_akhaliq/status/1691914689926840809][Learning]] to Identify Critical States for Reinforcement Learning from Videos
  - mask-based sensitivity analysis to extract/identify important critical states ==identify important==
  - recognize relevant states/actions/rewards.  = untagged videos
- [[https://twitter.com/_akhaliq/status/1716307038764933189][Let's Synthesize]] Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
  - extrapolating the errors made by a small model trained on the synthesized dataset using llm
- [[https://arxiv.org/abs/2312.02548][GeNIe]]: Generative Hard Negative Images Through Diffusion (synthetic enhanced dataset)
  - generate challenging samples for the target category
- DistDiff: [[https://github.com/haoweiz23/DistDiff][Distribution-Aware]] Data Expansion with Diffusion Models
  - dataset expansion framework based on the distribution-aware diffusion model
  - hierarchical prototypes to approximate the real data distribution
** SIMULATION
:PROPERTIES:
:ID:       ba0ec473-43d7-4211-98a8-da6ad853b696
:END:
- [[https://twitter.com/kayvonf/status/1688582905394757633][madrona-engine]]: [[https://madrona-engine.github.io/][ECS-based]] game engine that runs 10,000s of environments in parallel on a single GPU
- [[https://virl-platform.github.io/][V-IRL]]: Grounding Virtual Intelligence in Real Life
  - test foundation models in virtual real world cities, geospatial data and street view imagery
* FINETUNING
- [[https://arxiv.org/abs/2401.04105][Dr2Net]]: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
  - surrogate network to finetune a pretrained model with substantially reduced memory consumption
  - comparable performance to conventional finetuning but with significantly less memory usage
- [[https://arxiv.org/abs/2401.15657][Data-Free]] [[https://github.com/ylong4/DFZSL][Generalized]] Zero-Shot Learning (using only it's clip features)
- [[https://arxiv.org/abs/2403.02334][Gradient Correlation]] Subspace Learning against Catastrophic Forgetting
  - detects a subspace of the weights that is least affected by previous tasks trains the new task into said subspace
- [[https://twitter.com/_akhaliq/status/1770675608575435046][Evolutionary]] Optimization of Model Merging Recipes
  - facilitates crossdomain merging, automated model composition
- [[https://twitter.com/_akhaliq/status/1772828395107192981][The Unreasonable]] Ineffectiveness of the Deeper Layers
  - identify optimal block of layers to prune by considering similarity across layers
    - then, to "heal" the damage, we perform a small amount of finetuning
** FINETUNES
*** YOLO
- https://docs.ultralytics.com/yolov5/tutorials/train_custom_data/#train-on-custom-data
* GAN ALTERNATIVE
- [[id:9e94f7d8-752f-48e9-9ef1-9c79eba258e3][ONE STEP DIFFUSION]]