multiple working-into together:
diffusion: MULTIPLE DIFFUSION GIT RE-BASIN
text: RAD, EFT
logistic: Auto-Instruct
Self-SupervisedLearning with Lie Symmetries for Partial Differential Equations
computationally efficient alternatives to numerical solvers
self-supervised learn general-purpose representations of PDEs from heterogeneous data
Q* New Objective Q-Learning and Q* - Decision Making Under Uncertainty (CS238/AA228)
Q-learning parallels biological reward neurocircuitry, reinforcement learning (RL)
Model-BasedControl with Sparse Neural Dynamics (aggressive sparsification) (distillation)
parsify it by removing redundant neurons, applicable to a wide variety of DNNs
Zero BubblePipeline Parallelism
algorithm for optimal schedule on config and memory limit
Fuyou Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU
training with low-end GPU and limited CPU memory capacity
Direct NashOptimization: Teaching Language Models to Self-Improve with General Preferences
post-training llm using preference feedback from a teacher model to iteratively improve over itself
marries the simplicity and stability of contrastive learning with the theoretical generality from optimizing general preferences
From Wordsto Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples
rivaling supervised methods such as Random Forest, Bagging, or Gradient Boosting
Deep neuralnetworks are robust to weight binarization and other non-linear distortions
0.68 effective bits per weight (below 1 bit models)
points to the idea that a stochastic memory element can be used
ScalableMatMul-freeLanguage Modeling
replaces MatMul operations in dense layers with ternary accumulations using weights constrained to {-1, 0, +1}
reducing computational cost and memory while preserving network expressiveness
gpus? they removed matmults but still use hadamard product (element wise product), which are also embarrassingly parallel and can be GPUs accelerated
using flexible GPUs for training and FPGAs-ASIC for inference is the optimal tradeoff for this
optimizer from 32 bits to 8 bits
faster matrix using approximations
feedback: FEEDBACK AS TARGETHUMAN FEEDBACKPROPER-ING INSTRUCTIONS
AlignProp Aligning Text-to-Image Diffusion Models with Reward Backpropagation
aligns to reward functions
CPL Contrastive Prefence Learning: Learning from Human Feedback without RL
learning optimal policies from preferences without learning reward functions
regret-based model of human preferences instead of reward
Vision-LanguageModels are Zero-Shot Reward Models for Reinforcement Learning
reward function often infeasible(not posible), reward model from human feedback often very expensive
VLMs(CLIP) as reward models: a single sentence text prompt describing the desired task
TD-MPC2 Scalable, Robust World Models for Continuous Control
agent to perform 80 tasks across multiple task domains, embodiments, and action spaces
performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model
Text2Reward AutomatedDense Reward Function Generation for Reinforcement Learning
automates the generation of dense reward functions based on llm
Eureka Human-LevelReward Design via Coding Large Language Models
generates reward functions that outperform expert human-engineered rewards
so now can acquire complex skills via reinforcement learning, optimization over reward
to get sequential decision-making tasks
in-context RLHF to incorporate feedback and steer and align the reward function
outer loop: inference-only LLM instructs a learnable NN to refine the reward function
inner loop: reinforcement learning to train a controller
pen spinning
ConvNeXt(vs ViT, for image classification)
accurate, efficient, scalable and very simple in design
for: zero-shot image classification, image and text retrieval
clip convnext: https://huggingface.co/laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft (320 vs 320)
CNCA Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
replacing linear recurrence with a special temporal convolutional network
permits larger receptive field size with shallower networks
reduces the computational complexity to O(L)
PanGu-Ο Enhancing Language Model Architectures via Nonlinearity Compensation
shortcut used to enhance the model nonlinearity, 10% inference speed-up
non linearity usual in convolutional networks for vision tasks
muP is proposes "right way to scale", effective weight init scheme; searching the optimal hyperparameters
are llm just text compression algorithms?
LLMZip Lossless Text Compression using Large Language Models
gzip instead of parameters for classification
βLow-Resource Text Classification: A Parameter-Free Classification Method with Compressors
Knowledge Translation A New Pathway for Model Compression
teacher-student model that receives parameters and generates compressed ones
AdaLoRAadaptively allocates the parameter budget among weight matrices according to their importance (adaptive lora)
FLIQS One-Shot Mixed-Precision Floating-Point and Integer Quantization Search
mixed-precision quantization, eliminates the need for retraining
Lion better than Adam, optimizer
Sketchy Memory-efficientAdaptive Regularization with Frequent Directions
Kronecker-factored diagonal eigenvalues, Frequent Directions
One Step ofGradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
OptimizedNetwork Architectures for Large Language Model Training with Billions of Parameters
only small subgroups of GPUs require high-bandwidth any-to-any communication within them
dimensionality reduction algorithms
t-SNE and UMAP had long been the favorites
"Deep TDA" combines self-supervised learning and Topological Data Analysis (TDA)
unlock new insights from complex datasets
more robust to noise and outliers in the data
Gen2Det Generate to Detect
directly generating scene-centric images (synthetic)
improves the performance on rare categories
Imageclassification network enhancement methods based on knowledge injection
knowledge injection dataset to improve interpretability and classification performance of hidden layers
MovieLLM Enhancing Long Video Understanding with AI-Generated Movies
generate a script and correspoinding video as dataset
In-ContextPrinciple Learning from Mistakes
induce model to make mistakes; then we reflect on these mistakes, and learn explicit task-specific "principles" from them which help solve similar mistakes
MatSynth Physically Based Rendering (PBR) materials dataset (4,000 ultra-high resolution)
FindingEmo An Image Dataset for Emotion Recognition in the Wild
annotated dimensions include: valence, arousal and emotion
English publicdomain books
Annotated Handsfor Generative Models
with three additional channels that provide annotations to hands in the image, additional structure
Learningto Identify Critical States for Reinforcement Learning from Videos
mask-based sensitivity analysis to extract/identify important critical states ==identify important==
recognize relevant states/actions/rewards. = untagged videos
Let's SynthesizeStep by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
extrapolating the errors made by a small model trained on the synthesized dataset using llm
GeNIe Generative Hard Negative Images Through Diffusion (synthetic enhanced dataset)
generate challenging samples for the target category
DistDiff: Distribution-AwareData Expansion with Diffusion Models
dataset expansion framework based on the distribution-aware diffusion model
hierarchical prototypes to approximate the real data distribution
madrona-engine ECS-basedgame engine that runs 10,000s of environments in parallel on a single GPU
V-IRL Grounding Virtual Intelligence in Real Life
test foundation models in virtual real world cities, geospatial data and street view imagery
Dr2Net Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
surrogate network to finetune a pretrained model with substantially reduced memory consumption
comparable performance to conventional finetuning but with significantly less memory usage
Data-FreeGeneralizedZero-Shot Learning (using only it's clip features)
Gradient CorrelationSubspace Learning against Catastrophic Forgetting
detects a subspace of the weights that is least affected by previous tasks trains the new task into said subspace
EvolutionaryOptimization of Model Merging Recipes
facilitates crossdomain merging, automated model composition
The UnreasonableIneffectiveness of the Deeper Layers
identify optimal block of layers to prune by considering similarity across layers
then, to "heal" the damage, we perform a small amount of finetuning