parent: stable_diffusiontrain
BETTER DECODERblue noise: NOISE CONTROL
400x (and use vae leafing to make big)
DiffusersCompatibleSDXL Unet Rewrite (520 lines)
ScaleLong Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection
scaling the coefficients of LSC(which connect distant blocks) in UNet to improve training stability of UNet
Cas-DM Bring Metric Functions into Diffusion Models (incorporating additional metric functions, objectives)
QuantumDenoising Diffusion Models
explores integrating variational quantum circuits to augment efficacy of diffusion
MPI: Masked Pre-trainedModel EnablesUniversal Zero-shot Denoiser
spontaneously attains the underlying potential for strong image denoising
SimplifiedDiffusionSchrödinger Bridge
simplification of the Diffusion Schrödinger Bridge (DSB) that facilitates its unification with Score-based Generative Models (SGMs)
RL forConsistency Models, Faster Reward Guided Text-to-Image Generation
to optimize for task specific rewards, enable fast training-inference, we propose fine-tuning via RL
Reinforcement Learning for Consistency Model (RLCM)
objectives challenging with prompting, like image compressibility and human feedback
LP-DiF LearningPrompt with Distribution-Based Feature Replay for Few-Shot Class-Incremental Learning
continuously learn new classes without forgetting old ones
S2-DMsSkip-Step Diffusion Models
new training method, Lskip, designed to reintegrate omitted info during the selective sampling phase
SwitchEMA: A Free Lunch for Better Flatness and Sharpness
switching the EMA parameters to the original model after each epoch, dubbed as Switch EMA (SEMA)
free lunch by boosting convergence speeds
RollingDiffusion Model (VIDEO)
a sliding window denoising process
more noise to frames that appear later in a sequence
FixedPoint Diffusion Models
reallocating computation across timesteps and reusing fixed point solutions between timesteps
87% fewer parameters, consumes 60% less training memory
Analyzingand Improving the Training Dynamics of Diffusion Models
redesigned, so better networks at equal computational complexity
precise tuning of EMA length without the cost of performing several training runs
ConPreDiff Improving Diffusion-Based Image Synthesis with Context Prediction (better zeroshot)
Any-ShiftPrompting for Generalization over Distributions
encode the distribution information and their relationships
guide the generalization of the CLIP image-language model from training to any test distribution
StructurePreserving Diffusion Models
result: if you rotate the input, the output also rotates unharmed; learn structures
DeconstructingDenoising Diffusion Models for Self-Supervised Learning
gradually transforming a Denoising Diffusion Models (DDM) into a classical Denoising Autoencoder (DAE) (VAE)
FLAWED The VAE used for Stable Diffusion 1.x/2.x and other models (KL-F8) has a critical flaw, probably due to bad training, needs a new trained from scratch like SDXL ==best==
the encoder is having to do a lot of extra work to get around the bad latent space
distributed-diffusionusing hivemind (distributed training) vs Deepspeed
SiTdiscrete transformers
4, 8 bit models, Q-Diffusioninsight redditquantization
Memory-EfficientPersonalization using Quantized Diffusion Model (enhancing it)
EnhancedDistribution Alignment for Post-Training Quantization of Diffusion Models
align outputs of the quantized model and the full-precision model at different network granularity
QuEST Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
finetuning the quantized model to better adapt to the activation distribution (mitigation)
Task-OrientedDiffusion Model Compression
satisfactory output quality with 39.2% and 56.4% reduction in model footprint and 81.4% and 68.7%
applying it to InstructPix2Pix and StableSR
GIT RE-BASIN MERGING MODELS MODULO PERMUTATION SYMMETRIES
transfer knowledge between teacher to student model
IdempotentGenerative Network
f(f(z))=f(z), can generate an output in one step
step towards a "global projector" = projecting any input into a target data distribution
uform clip not required, trained in a day
cloneofsimo: learning fromthe clip
wanna perform affordable kernel regression on l2-normalized data?
get yourself Spherical Random Features for Polynomial Kernels
relevant if you are aiming for large scale non-parametric regression on CLIP projected feature spaces
Efficient DiffusionTraining via Min-SNRWeighting Strategy
slow convergence due to conflicting optimization directions between timesteps, 3.4 times faster
Imagen suggests that scaling the text encoder is much more impactful than scaling the UNet
at least for diffusion models
mosaiclml custom$50k stable diffusion training, reddit post
PatchDiffusion Faster and More Data-Efficient Training of Diffusion Models
compressed-stable-diffusion 36% reduced parameters and latency
Wuerstchen EfficientPretraining of Text-to-Image Models
16 times faster to train, 2 times faster inference, , only 9200 GPU hours (42 time compression rate vs 8 of sd)
DREAM Diffusion Rectification and Estimation-Adaptive Models (requiring minimal code changes)
2 to 3 times faster training convergence
PERCEPTUAL LOSS=best==
LCM==best==
UFOGen You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
integrating diffusion with a GAN objective, one step
faster using electric flow-charges https://www.assemblyai.com/blog/an-introduction-to-poisson-flow-generative-models/
better than inference: https://twitter.com/_akhaliq/status/1620958983639924736 <button class="pull-tweet" value=https://twitter.com/_akhaliq/status/1620958983639924736>pull</button> https://arxiv.org/pdf/2302.00482.pdf
Spectral Diffusion slim Standard Diffusion, 20 times smaller in size
Wavelet DiffusionModels arefast and scalable Image Generators
Score-BasedDiffusion Models as Principled Priors for Inverse Imaging (more complex priors)
Shifted Diffusion==Corgi== for Text-to-image Generation: from clip straight to diffusion, ==only 1.7 of the images required captions==
Object Detection: CutLER
D3S Invariant Learning via Diffusion Dreamed Distribution Shifts, separating foreground-background
disentangling foreground from background by chopping-pasting them out in the synthetic training dataset
like SVDiff
A Pictureis Worth a Thousand Words: Principled Recaptioning Improves Image Generation
automatic captioning is better than crawled low quality captions
CapsFusion Rethinking Image-Text Data at Scale
hindered by simplistic captioners, consolidate and refine information
Structure-GuidedAdversarial Training of Diffusion Models
compel the model to learn manifold structures between samples in each training batch
Neural Congealing Aligning Imagesto a Joint Semantic Atlas
zeroshot leaning concept-shapes
ASIC Aligning Sparsein-the-wild Image Collections
Ablating ConceptsinText-to-Image Diffusion Models (adobe)
masking to accelerate learning VQ-Diffusionhttps://arxiv.org/pdf/2111.14822.pdf
DeepMIM Deep Supervisionfor Masked Image Modeling
pre-trains a Vision Transformer (ViT) via a mask-and-predict scheme.
Predictingmasked tokens in stochastic locations improves masked image modeling
learning features that are more robust to location uncertainties; Masked Image Modeling (MIM)
I have recently written a paper on understanding transformer learning via the lens of coinduction & Hopf algebra. https://arxiv.org/abs/2302.01834
The learning mechanism of transformer models was poorly understood however it turns out that a transformer is like a circuit with a feedback.
I argue that autodiff can be replaced with what I call in the paper Hopf coherence which happens within the single layer as opposed to across the whole graph.
Furthermore, if we view transformers as Hopf algebras, one can bring convolutional models, diffusion models and transformers under a single umbrella.
I'm working on a next gen Hopf algebra based machine learning framework.
Join my discord if you want to discuss this further https://discord.gg/mr9TAhpyBW