OpenSourceModelbut for new Hardware
cpp geeration library and list of supported models(gpt, RWKV): ggml
LanguageModel Inversion
given output reconstruct the original prompt
LoRA orQLoRA by Google
llama plugins: https://twitter.com/algo_diver/status/1639681733468753925 <button class="pull-tweet" value=https://twitter.com/algo_diver/status/1639681733468753925>pull</button>
llama tools: https://github.com/OpenBMB/ToolBench
streamingvs non-streaming generation
langchain, and https://github.com/srush/MiniChain
PEARL Prompting Large Language Models to Plan and Execute Actions Over Long Documents
MemGPT managesmemory tiers to effectively provide extended context within llm limited context window
llm taught to manage their own memory, resembles paging in OS (main context, external context) ==best==
trained to generate function calls
ClinicalCamel An Open-Source Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding; medical, doctor
Personality Traitsin Large Language Models, quantifying personalities
ChipNeMo Domain-Adapted LLMs for Chip Design
LARP Language-Agent Role Play for Open-World Games
decision-making assistant, framework refines interactions between users and agents
PosterLlama Bridging Design Ability of Langauge Model to Contents-Aware Layout Generation
reformatting layout elements into HTML code
unconditional layout generation, element conditional layout generation, layout completion
Pix2Struct text to plot
DePlot: plot-to-text model helping LLMs understand plots
MatCha: great chart & math capabilities by plot deconstruction & numerical reasoning objectives
StructLM Towards Building Generalist Models for Structured Knowledge Grounding
based on the Code-LLaMA architecture
SaulLM-7B A pioneering Large Language Model for Law
designed explicitly for legal text comprehension and generation
PixelAligned Language Models
can take locations (set of points, boxes) as inputs or outputs
location-aware vision-language tasks
CrossCodeEval A Diverse and Multilingual Benchmark for Cross-File Code Completion
cross-file contextual understanding
mistral-8X-7B> codellama-34B (on humaneval)
Llemma An Open Language Model For Mathematics
capable of tool use and formal theorem proving
Large LanguageModels for Mathematicians (academic)
mathematical description of the transformer model used in all modern language models
Chronos Learning the Language of Time Series
improve zero-shot accuracy on unseen forecasting tasks; forecasting pipeline
MathVerse Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
extract crucial reasoning steps, to reveal the intermediate reasoning quality
MLLMs
DeciCoder decoder-only code completion model
approach of grouping tokens into clusters and having each token attend to others only within its cluster
Magicoder SourceCode Is All You Need
MagicoderS-CL-7B based on CodeLlama
StepCoder Improve Code Generation with Reinforcement Learning from Compiler Feedback
breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks
while masking segments to properly optimize
EnhancingNetwork Management Using Code Generated by Large Language Models
program synthesis: generate task-specific code from natural language queries
analyzing network topologies and communication graphs
CodeFusion A Pre-trained Diffusion Model for Code Generation ==diffusion== (75M vs 1B auto-regressive)
iterative denoising, no need to start from scratch
Text RenderingStrategies for Pixel Language Models
characters as images, handle any script; PIXEL model
Grammar Promptingfor Domain-Specific Language Generation with Large Language Models
like programming languages
predict a BNF grammar given an input, then generates the output according to the rules of that grammar
Tool DocumentationEnables Zero-Shot Tool-Usage with Large Language Models
zero-shot prompts with only documentation are sufficient for tool usage
tool documentation > demonstrations
ControlLLM Augment Language Models with Tools by Searching on Graphs
breaks down a complex task into clear subtasks, then optimal solution path
Fay integrating language models and digital characters
elit provides NLPs for tokenization, tagging, recognition of languages
translation prompt: https://boards.4channel.org/g/thread/92468569#p92470651
EMMA Efficient Monotonic Multihead Attention
simultaneous speech-to-text translation on the Spanish and English translation task
OPRO: Optimization by PROmpting, Large LanguageModels as Optimizers
each step = generates new solutions from previously generated solutions
Large LanguageModels for Compiler Optimization
reducing instruction counts over the compiler
EvoPrompt Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers
SparQAttention: Bandwidth-Efficient LLM Inference
reducing memory bandwidth requirements within the attention blocks through selective fetching of the cached history (up to eight times)
thread summarizer https://labs.kagi.com/ai/sum?url=%3E%3E248633369
LLM UseCase: Summarization (using langchain)
From Sparseto Dense: GPT-4 Summarization with Chain of Density Prompting
iteratively incorporating missing salient entities without increasing the length
LMDX Language Model-based Document Information Extraction and Localization
methodology to adapt arbitrary LLMs for document information extraction (without hallucination)
parent: diffusion
GENIE Large Scale Pre-training for Text Generation with Diffusion Model
TESS Text-to-Text Self-Conditioned Simplex Diffusion
AR-Diffusion Auto-Regressive Diffusion Model for Text Generation
PLANNER Generating Diversified Paragraph via Latent Language Diffusion Model
DiffusionDialog A Diffusion Model for Diverse Dialog Generation with Latent Space
enhances the diversity of dialog responses while maintaining coherence
allenai / OLMo actually open source AI model
PASTA Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
identifies a small subset of attention heads, then applies precise attention reweighting on them
applied next to prompting
S2A: System2 Attention (is something you might need too)
regenerates context to only include the relevant portions before responding
AcceleratingLLM Inference with Staged Speculative Decoding
restructure the speculative batch as a tree
MobileNMT Enabling Translation in 15MB and 30ms
FlashDecoding++ Faster Large Language Model Inference on GPUs
inference engine, 4-2x speedup; no more matrix flatness
ExponentiallyFaster Language Modelling
replacing feedforward networks with fast feedforward networks (FFFs)
engages just 12 out of 4095 neurons for each layer inference, 78x speedup
EAGLE LLM decodingbased on compression (and others with comparison: Medusa, Lookahead, Vanilla)
sequence of second-top-layer features is compressible, making the prediction of subsequent feature vectors from previous ones easy by a small model
jina-embeddings-v2 8kcontext length, bert architecture
LLaMa ipfs
in browser(there is also the cpp one)
tain allLlama-2 models on your own data
Open LLama Open-SourceReproduction, permissively licensed; Lit-LLaMA RedPajama dataset
Falcon new family, open-source ==instruct finetuned too==
LLaMA Pro Progressive LLaMA with Block Expansion
take pretrained model freeze params, then add new blocks
model with new data without forgetting old
LiteLlama has 460M parameters trained with 1T tokens.
MobiLlama Small Language Models (SLMs), open-source 0.5 billion (0.5B) parameter
Mistral-7B outperforms Llama 2 13B, MIT-Apache Licensed
BakLLaVA mistral + vision model
zephyr fine-tunedusingDirect Performance Optimization
dataset ranked by a teacher model with intent alignment, smaller: 7b vs 70b llama
OpenHermes-2 roleplay, gpt4 dataset
notux chat data
Conditional Adapters Parameter-efficient Transfer Learning with Fast Inference
TrainingLarge Language Models Efficiently with Sparsity and Dataflow
RandomizedPositional Encodings Boost Length Generalization of Transformers
MixCE Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies
reverse cross-entropy RATHER THAN maximum likelihood estimation (MLE)
Neurons in LargeLanguage Models: Dead, N-gram, Positional
study: 70 neurons per layer are dead, some neurons specialize in removing the information from input
Backpack LanguageModels: non-contextual sense vectors, which specialize encoding different aspects word
In-ContextLearning Creates Task Vectors
In-context learning = compressing training set into a single task vector, then using it to modulate transformer to produce the output
EfficientStreaming Language Models with Attention Sinks (==better inference or trainning==)
==context window cache is bad==, just keep first tokens around (as is)
or it is better to have a static null token at begining of window
reated to "Vision Transformers need registers" paper
JetMoE ReachingLLaMA2 Performance with 0.1M Dollars
and can be finetuned with a very limited computing budget
From Word Modelsto WorldModels Translating from Natural Language to the Probabilistic Language of Thought
probabilistic programming language = commonsense reasoning, linguistics
Quiet-STaR Language Models Can Teach Themselves to Think Before Speaking
learn to generate rationales at each token to explain future text, improving their predictions
LLM-Blender Ensembling Large Language Models with Pairwise Ranking and Generative Fusion
(specialized) text model merging (using rankings)
FuseChat Knowledge Fusion of Chat Models
knowledge fusion for structurally diverse architectures and scales llms
Skeleton-of-Thought Large Language Models Can Do Parallel Decoding
first skeleton, then parallel filling; faster and better
ART Automaticmulti-step reasoning and tool-use for large language models
bubbles of logic
Orca 2 TeachingSmall Language Models How to Reason
reasoning techniques: step-by-step, recall then generate, recall-reason-generate, direct answer
PathFinder Guided Search over Multi-Step Reasoning Paths
tree-search-based reasoning path generation approach (beam search algorith)
improved commonsense reasoning tasks and complex arithmetic
Stream ofSearch (SoS): Learning to Search in Language
models can be taught to search by representing the process of search in language, as a flattened string
Teach LLMsto Personalize — An Approach inspired by Writing Education
retrieval, ranking, summarization, synthesis, and generation
Link-ContextLearning for Multimodal LLMs
causal associations between data points = cause and effect
In-Context Learning (ICL) = learn to learn
from limited tasks (providing demonstrations) and generalize to unseen tasks
LoGiPT: Language Modelscan be Logical Solvers
parse natural language logical questions into symbolic representations, emulates logical solvers
NPM Nonparametric Masked Language Modeling, vs GPTv3, text corpus based
other code implementations https://www.catalyzex.com/paper/arxiv:2212.01349/code
RAVEN In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models
context learning in retrieval-augmented language models
CopyIs All You Need
task of text generation decomposed into a series of copy-and-paste operations
text spans rather than vocabulary
learning = text compression algorithm ?
Decoding the ACL Paper: Gzip and KNN Rival BERT in Text Classification
LLM2Vec Large Language Models Are Secretly Powerful Text Encoders
LLMs can be effectively transformed into universal text encoders without the need for expensive adaptation
int-3 quantization: https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-int-3-and twitter
llama.cpp quantization
AWQ Activation-aware Weight Quantization for LLM Compression and Acceleration
OmniQuant Omnidirectionally Calibrated Quantization for Large Language Models
no more hand-craft-ed quantization parameters
LLM-FP4 4-Bit Floating-Point Quantized Transformers, 5.8% lower on reasoning than the full-precision model
BiLLM Pushing the Limit of Post-Training Quantization for LLMs
identifies and structurally selects salient weights
7 billion weights within 0.5 hours
EasyQuant An Efficient Data-free Quantization Algorithm for LLMs
leave the outliers (less than 1%) unchanged, implemented in parallel
BitNet Scaling 1-bit Transformers for Large Language Models
vs 8-bit quantization architectures
QMoE Practical Sub-1-Bit Compression of Trillion-Parameter Models
can compress 1.6 trillion parameter model to less than 160GB (20x compression, 0.8 bits per parameter)
QA-LoRA Quantization-Aware Low-Rank Adaptation of Large Language Models
QLoRA Efficient Finetuning of Quantized LLMs, 24 hours 1 gpu 48g
LoftQ LoRA-Fine-Tuning-Aware Quantization for Large Language Models
outperforms than QLora
LLaMA-Adapter Efficient Fine-tuningof Language Models with Zero-init Attention
In-ContextInstructionLearning (ICIL)
LoRAShear Efficient Large Language Model Structured Pruning and Knowledge Recovery
distillation
rlhf = Reinforcement Learning with Human Feedback
Direct PreferenceOptimization: Your Language Model is Secretly a Reward Model (DPO)
can fine-tune LMs to align with human preferences, better than RLHF
RAD: Reward-AugmentedDecoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
generation which uses extra reward model to generate text with certain properties
ReFT Reasoning with Reinforced Fine-Tuning
learn from multiple annotated reasoning paths
rewards are naturally derived from the ground-truth answers (like math)
TriPosT Teaching Language Models to Self-Improve through Interactive Demonstrations
self-improvement for small models ability, revise own outputs correcting its own mistakes
Self-Refine Iterative Refinement with Self-Feedback
Fine-Tuning LanguageModels with Just Forward Passes, less ram
Full ParameterFine-tuning for Large Language Models with Limited Resources, low-memory optimizer
EFT: An Emulatorfor Fine-Tuning Large Language Models using Small Language Models
avoid resource-intensive fine-tuning of llm by ensembling them with small fine-tuned models
also: scaling up finetuning improves helpfulness, scaling up pre-training improves factuality
Tuna Instruction Tuning using Feedback from Large Language Models
finetuning with contextual ranking
AutoMix Automatically Mixing Language Models
strategically routes queries to larger llm, based on the outputs from a smaller LM
LoraHub EfficientCross-Task Generalization via Dynamic LoRA Composition
LoRA composability for cross-task generalization; neither more parameters nor gradients
Parameter-EfficientOrthogonal Finetuning via Butterfly Factorization
sentence transformers: SetFit
efficient few-shot learning
PEFTw/ Multi LoRA explained (LLM fine-tuning)
Memorizing Transformer does not need to be pre-trained from scratch; possible adding memory to an existing pre-trained model, and then fine-tuning it
Think BeforeYou Act: Decision Transformers with Internal Working Memory, task specialized memory
MemoryAugmented Language Models through Mixture of Word Experts
Mixture of Word Experts (MoWE) (Mixture-of-Experts (MoE))
set of word-specific experts play the role of a sparse memory, similar performance to more complex memory augmented
Fiddler CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
minimize the data movement between the CPU and GPU.
Mixtral-8x7B model, 90GB parameters, over 3 tokens per second on a single GPU with 24GB memory
GaLore Memory-Efficient LLM Training by Gradient Low-Rank Projection ==best==
feasibility of pre-training a 7B model on GPUs with 24GB memory; unlike lora
82.5% reduction in memory
AugmentingLanguage Models with Long-Term Memory (unlimited context)
YaRN Efficient Context Window Extension of Large Language Models
EfficientMemory Management for Large Language Model Serving with PagedAttention
vLLM: near-zero waste in KV cache memory, and flexible
Flash-Decoding make long-context LLM inference up to 8x faster
load the KV cache in parallel as fast as possible, then separately rescale to combine the results
Infinite-LLM Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
LLM serving system dynamically managing KV Cache, orchestrates across the data center
ExtendingLLMs' Context Window with 100 Samples
introduce a novel extension to RoPE so that it can adapt to larger context windows (efficiently)
exampled on llama
LIMA LessIs More for Alignment
trained only 1,000 carefully curated prompts and responses
q2d Turning Questions into Dialogs to Teach Models How to Search
synthetically-generated data achieve 90%—97% of the performance of training on human-generated data
Impossible Distillation from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing
high-quality model and dataset from a low-quality teacher model
Simple syntheticdata reduces sycophancy in large language models
sycophancy = adapting views once a user reveals their views, to statements that are objectively incorrect
lightweight finetuning step
GPT CanSolve Mathematical Problems Without a Calculator; with training data = multi-digit arithmetic
TeacherLM Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
anotating the dataset with "why" instead of only "what"
Lema: LearningFrom Mistakes Makes LLM Better Reasoner
identify, explain, correct mistakes using the llm itself to fintune (learn from mistakes)
Ziya2 Data-centric Learning is All LLMs Need
focus on pre-training techniques and data-centric optimization to enhance learning process