📓 nodes/20230628202646-text.org by @tekakutli-org ☆

OpenSource Modelbut for new Hardware
cpp geeration library and list of supported models(gpt, RWKV): ggml
LanguageModel Inversion
- given output reconstruct the original prompt
LoRA orQLoRA by Google

ADDED - EXTRAS TO LLM

llama plugins: https://twitter.com/algo_diver/status/1639681733468753925 <button class="pull-tweet" value=https://twitter.com/algo_diver/status/1639681733468753925>pull</button>
llama tools: https://github.com/OpenBMB/ToolBench
streamingvs non-streaming generation

VECTOR DB

langchain, and https://github.com/srush/MiniChain
- PEARL Prompting Large Language Models to Plan and Execute Actions Over Long Documents
MemGPT managesmemory tiers to effectively provide extended context within llm limited context window
- llm taught to manage their own memory, resembles paging in OS (main context, external context) ==best==
- trained to generate function calls

SPECIALIZED USES

QUERING MODELS - MULTIMODAL
Clinical Camel An Open-Source Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding; medical, doctor
Personality Traitsin Large Language Models, quantifying personalities
ChipNeMo Domain-Adapted LLMs for Chip Design
LARP Language-Agent Role Play for Open-World Games
- decision-making assistant, framework refines interactions between users and agents

LAYOUT LLM

PosterLlama Bridging Design Ability of Langauge Model to Contents-Aware Layout Generation
- reformatting layout elements into HTML code
- unconditional layout generation, element conditional layout generation, layout completion

PLOT

Pix2Struct text to plot
- DePlot: plot-to-text model helping LLMs understand plots
- MatCha: great chart & math capabilities by plot deconstruction & numerical reasoning objectives
StructLM Towards Building Generalist Models for Structured Knowledge Grounding
- based on the Code-LLaMA architecture

LEGAL

SaulLM-7B A pioneering Large Language Model for Law
- designed explicitly for legal text comprehension and generation

VISUAL

PixelAligned Language Models
- can take locations (set of points, boxes) as inputs or outputs
- location-aware vision-language tasks

CODE ASSISTANT

ROBOTS WEB MOCKING
CrossCodeEval A Diverse and Multilingual Benchmark for Cross-File Code Completion
- cross-file contextual understanding
mistral-8X-7B> codellama-34B (on humaneval)

MATH

Llemma An Open Language Model For Mathematics
- capable of tool use and formal theorem proving
Large LanguageModels for Mathematicians (academic)
- mathematical description of the transformer model used in all modern language models
Chronos Learning the Language of Time Series
- improve zero-shot accuracy on unseen forecasting tasks; forecasting pipeline
MathVerse Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
- extract crucial reasoning steps, to reveal the intermediate reasoning quality
- MLLMs

CODE COMPLETION

DeciCoder decoder-only code completion model
- approach of grouping tokens into clusters and having each token attend to others only within its cluster
Magicoder SourceCode Is All You Need
- MagicoderS-CL-7B based on CodeLlama
StepCoder Improve Code Generation with Reinforcement Learning from Compiler Feedback
- breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks
  - while masking segments to properly optimize

OPERATOR

EnhancingNetwork Management Using Code Generated by Large Language Models
- program synthesis: generate task-specific code from natural language queries
  - analyzing network topologies and communication graphs

DIFFUSION

CodeFusion A Pre-trained Diffusion Model for Code Generation ==diffusion== (75M vs 1B auto-regressive)
- iterative denoising, no need to start from scratch
Text RenderingStrategies for Pixel Language Models
- characters as images, handle any script; PIXEL model

TOOLS-USE TOOLS

Grammar Promptingfor Domain-Specific Language Generation with Large Language Models
- like programming languages
- predict a BNF grammar given an input, then generates the output according to the rules of that grammar
Tool DocumentationEnables Zero-Shot Tool-Usage with Large Language Models
- zero-shot prompts with only documentation are sufficient for tool usage
- tool documentation > demonstrations
ControlLLM Augment Language Models with Tools by Searching on Graphs
- breaks down a complex task into clear subtasks, then optimal solution path
Fay integrating language models and digital characters

TRANSLATION

elit provides NLPs for tokenization, tagging, recognition of languages
translation prompt: https://boards.4channel.org/g/thread/92468569#p92470651
EMMA Efficient Monotonic Multihead Attention
- simultaneous speech-to-text translation on the Spanish and English translation task

OPTIMIZATION

OPRO: Optimization by PROmpting, Large LanguageModels as Optimizers
- each step = generates new solutions from previously generated solutions
Large LanguageModels for Compiler Optimization
- reducing instruction counts over the compiler
EvoPrompt Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

CACHE

SparQAttention: Bandwidth-Efficient LLM Inference
- reducing memory bandwidth requirements within the attention blocks through selective fetching of the cached history (up to eight times)

SUMMARIZATION

thread summarizer https://labs.kagi.com/ai/sum?url=%3E%3E248633369
LLM UseCase: Summarization (using langchain)
From Sparseto Dense: GPT-4 Summarization with Chain of Density Prompting
- iteratively incorporating missing salient entities without increasing the length
LMDX Language Model-based Document Information Extraction and Localization
- methodology to adapt arbitrary LLMs for document information extraction (without hallucination)

TEXT DIFFUSION

parent: diffusion
GENIE Large Scale Pre-training for Text Generation with Diffusion Model
TESS Text-to-Text Self-Conditioned Simplex Diffusion
- AR-Diffusion Auto-Regressive Diffusion Model for Text Generation
PLANNER Generating Diversified Paragraph via Latent Language Diffusion Model
DiffusionDialog A Diffusion Model for Diverse Dialog Generation with Latent Space
- enhances the diversity of dialog responses while maintaining coherence

TEXT GENERATION

allenai / OLMo actually open source AI model

INFERENCE

BETTER

FOCUS THE ATTENTION

PASTA Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
- identifies a small subset of attention heads, then applies precise attention reweighting on them
- applied next to prompting
S2A: System2 Attention (is something you might need too)
- regenerates context to only include the relevant portions before responding

FASTER

AcceleratingLLM Inference with Staged Speculative Decoding
- restructure the speculative batch as a tree
MobileNMT Enabling Translation in 15MB and 30ms
FlashDecoding++ Faster Large Language Model Inference on GPUs
- inference engine, 4-2x speedup; no more matrix flatness
ExponentiallyFaster Language Modelling
- replacing feedforward networks with fast feedforward networks (FFFs)
- engages just 12 out of 4095 neurons for each layer inference, 78x speedup
EAGLE LLM decodingbased on compression (and others with comparison: Medusa, Lookahead, Vanilla)
- sequence of second-top-layer features is compressible, making the prediction of subsequent feature vectors from previous ones easy by a small model

MODELS

jina-embeddings-v2 8kcontext length, bert architecture
Yi-34B 6B and34B, better than llama2 (has benchmarks list)
- Yi Open Foundation Models by 01.AI

QWEN

Qwen-7B: surpasses both LLaMA 2 7B and 13B on MMLU score, math and code
Qwen-1.5 space

LLAMA

LLaMa ipfs
in browser(there is also the cpp one)
tain allLlama-2 models on your own data

ALTERNATIVES

Open LLama Open-SourceReproduction, permissively licensed; Lit-LLaMA RedPajama dataset
Falcon new family, open-source ==instruct finetuned too==
LLaMA Pro Progressive LLaMA with Block Expansion
- take pretrained model freeze params, then add new blocks
- model with new data without forgetting old
LiteLlama has 460M parameters trained with 1T tokens.
MobiLlama Small Language Models (SLMs), open-source 0.5 billion (0.5B) parameter

MISTRAL

Mistral-7B outperforms Llama 2 13B, MIT-Apache Licensed
- BakLLaVA mistral + vision model
- zephyr fine-tuned usingDirect Performance Optimization
  - dataset ranked by a teacher model with intent alignment, smaller: 7b vs 70b llama
- OpenHermes-2 roleplay, gpt4 dataset
- https://huggingface.co/TheBloke/openchat_3.5-GGUF
- notux chat data

TRAINING

Conditional Adapters Parameter-efficient Transfer Learning with Fast Inference
- LLaMa-Adapter Multimodal (vision
TrainingLarge Language Models Efficiently with Sparsity and Dataflow
RandomizedPositional Encodings Boost Length Generalization of Transformers
MixCE Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies
- reverse cross-entropy RATHER THAN maximum likelihood estimation (MLE)
Neurons in LargeLanguage Models: Dead, N-gram, Positional
- study: 70 neurons per layer are dead, some neurons specialize in removing the information from input
Backpack LanguageModels: non-contextual sense vectors, which specialize encoding different aspects word
In-ContextLearning Creates Task Vectors
- In-context learning = compressing training set into a single task vector, then using it to modulate transformer to produce the output
EfficientStreaming Language Models with Attention Sinks (==better inference or trainning==)
- ==context window cache is bad==, just keep first tokens around (as is)
  - or it is better to have a static null token at begining of window
- reated to "Vision Transformers need registers" paper

CHEAPNESS

JetMoE ReachingLLaMA2 Performance with 0.1M Dollars
- and can be finetuned with a very limited computing budget

STRUCTURE

From Word Models to World Models Translating from Natural Language to the Probabilistic Language of Thought
- probabilistic programming language = commonsense reasoning, linguistics
Quiet-STaR Language Models Can Teach Themselves to Think Before Speaking
- learn to generate rationales at each token to explain future text, improving their predictions

MERGING

LLM-Blender Ensembling Large Language Models with Pairwise Ranking and Generative Fusion
- (specialized) text model merging (using rankings)
FuseChat Knowledge Fusion of Chat Models
- knowledge fusion for structurally diverse architectures and scales llms

SKELETON

Skeleton-of-Thought Large Language Models Can Do Parallel Decoding
- first skeleton, then parallel filling; faster and better
ART Automaticmulti-step reasoning and tool-use for large language models
- bubbles of logic
Orca 2 TeachingSmall Language Models How to Reason
- reasoning techniques: step-by-step, recall then generate, recall-reason-generate, direct answer
PathFinder Guided Search over Multi-Step Reasoning Paths
- tree-search-based reasoning path generation approach (beam search algorith)
- improved commonsense reasoning tasks and complex arithmetic
Stream ofSearch (SoS): Learning to Search in Language
- models can be taught to search by representing the process of search in language, as a flattened string

META-PROCESS TOKENS

Teach LLMsto Personalize — An Approach inspired by Writing Education
- retrieval, ranking, summarization, synthesis, and generation
Link-ContextLearning for Multimodal LLMs
- causal associations between data points = cause and effect
- In-Context Learning (ICL) = learn to learn
- from limited tasks (providing demonstrations) and generalize to unseen tasks
LoGiPT: Language Modelscan be Logical Solvers
- parse natural language logical questions into symbolic representations, emulates logical solvers

CORPUS STRUCTURE, RETRIEVAL

NPM Nonparametric Masked Language Modeling, vs GPTv3, text corpus based
- other code implementations https://www.catalyzex.com/paper/arxiv:2212.01349/code
RAVEN In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models
- context learning in retrieval-augmented language models

LLM AS ENCODER

GZIP VS GPT
- CopyIs All You Need
  - task of text generation decomposed into a series of copy-and-paste operations
  - text spans rather than vocabulary
  - learning = text compression algorithm ?
  - Decoding the ACL Paper: Gzip and KNN Rival BERT in Text Classification
LLM2Vec Large Language Models Are Secretly Powerful Text Encoders
- LLMs can be effectively transformed into universal text encoders without the need for expensive adaptation

QUANTIZATION

int-3 quantization: https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-int-3-and twitter
llama.cpp quantization
AWQ Activation-aware Weight Quantization for LLM Compression and Acceleration
- outperforms GPTQ in 4-bit and 3-bit with 1.45x speedup and works with multimodal LLMs
- SpQR methodfor LLM compression: highly sensitive parameters are not quantized
OmniQuant Omnidirectionally Calibrated Quantization for Large Language Models
- no more hand-craft-ed quantization parameters
LLM-FP4 4-Bit Floating-Point Quantized Transformers, 5.8% lower on reasoning than the full-precision model
BiLLM Pushing the Limit of Post-Training Quantization for LLMs
- identifies and structurally selects salient weights
  - 7 billion weights within 0.5 hours
EasyQuant An Efficient Data-free Quantization Algorithm for LLMs
- leave the outliers (less than 1%) unchanged, implemented in parallel

1-BIT

BitNet Scaling 1-bit Transformers for Large Language Models
- vs 8-bit quantization architectures
QMoE Practical Sub-1-Bit Compression of Trillion-Parameter Models
- can compress 1.6 trillion parameter model to less than 160GB (20x compression, 0.8 bits per parameter)

LORA WITH QUANTIZATION

QA-LoRA Quantization-Aware Low-Rank Adaptation of Large Language Models
QLoRA Efficient Finetuning of Quantized LLMs, 24 hours 1 gpu 48g
- LoftQ LoRA-Fine-Tuning-Aware Quantization for Large Language Models
  - outperforms than QLora

FINETUNNING

LLaMA-Adapter Efficient Fine-tuningof Language Models with Zero-init Attention
- In-Context InstructionLearning (ICIL)
LoRAShear Efficient Large Language Model Structured Pruning and Knowledge Recovery
- distillation

FEEDBACK AS TARGET

MULTIPLE LLM
rlhf = Reinforcement Learning with Human Feedback
Direct PreferenceOptimization: Your Language Model is Secretly a Reward Model (DPO)
- can fine-tune LMs to align with human preferences, better than RLHF
RAD: Reward-AugmentedDecoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
- generation which uses extra reward model to generate text with certain properties
ReFT Reasoning with Reinforced Fine-Tuning
- learn from multiple annotated reasoning paths
- rewards are naturally derived from the ground-truth answers (like math)

SELF TRAIN

TriPosT Teaching Language Models to Self-Improve through Interactive Demonstrations
- self-improvement for small models ability, revise own outputs correcting its own mistakes
Self-Refine Iterative Refinement with Self-Feedback

CHEAPNESS

Fine-Tuning LanguageModels with Just Forward Passes, less ram
Full ParameterFine-tuning for Large Language Models with Limited Resources, low-memory optimizer

MULTIPLE LLM

EFT: An Emulatorfor Fine-Tuning Large Language Models using Small Language Models
- avoid resource-intensive fine-tuning of llm by ensembling them with small fine-tuned models
- also: scaling up finetuning improves helpfulness, scaling up pre-training improves factuality
Tuna Instruction Tuning using Feedback from Large Language Models
- finetuning with contextual ranking
AutoMix Automatically Mixing Language Models
- strategically routes queries to larger llm, based on the outputs from a smaller LM

ADDITIVE METHODS

LORA WITH QUANTIZATION
LoraHub EfficientCross-Task Generalization via Dynamic LoRA Composition
- LoRA composability for cross-task generalization; neither more parameters nor gradients
Parameter-EfficientOrthogonal Finetuning via Butterfly Factorization

LORA

alpaca-lora
sentence transformers: SetFit
- efficient few-shot learning
peft twitter repo
- PEFTw/ Multi LoRA explained (LLM fine-tuning)

MEMORY

Memorizing Transformers repo
- Memorizing Transformer does not need to be pre-trained from scratch; possible adding memory to an existing pre-trained model, and then fine-tuning it
Think BeforeYou Act: Decision Transformers with Internal Working Memory, task specialized memory
MemoryAugmented Language Models through Mixture of Word Experts
- Mixture of Word Experts (MoWE) (Mixture-of-Experts (MoE))
- set of word-specific experts play the role of a sparse memory, similar performance to more complex memory augmented
Fiddler CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
- minimize the data movement between the CPU and GPU.
- Mixtral-8x7B model, 90GB parameters, over 3 tokens per second on a single GPU with 24GB memory
GaLore Memory-Efficient LLM Training by Gradient Low-Rank Projection ==best==
- feasibility of pre-training a 7B model on GPUs with 24GB memory; unlike lora
  - 82.5% reduction in memory

CONTEXT LENGTH

VECTOR DB
AugmentingLanguage Models with Long-Term Memory (unlimited context)
YaRN Efficient Context Window Extension of Large Language Models
EfficientMemory Management for Large Language Model Serving with PagedAttention
- vLLM: near-zero waste in KV cache memory, and flexible
Flash-Decoding make long-context LLM inference up to 8x faster
- load the KV cache in parallel as fast as possible, then separately rescale to combine the results
Infinite-LLM Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
- LLM serving system dynamically managing KV Cache, orchestrates across the data center
ExtendingLLMs' Context Window with 100 Samples
- introduce a novel extension to RoPE so that it can adapt to larger context windows (efficiently)
- exampled on llama

DATASET

SKELETON
LIMA LessIs More for Alignment
- trained only 1,000 carefully curated prompts and responses
q2d Turning Questions into Dialogs to Teach Models How to Search
- synthetically-generated data achieve 90%—97% of the performance of training on human-generated data
Impossible Distillation from Low-Quality Model to High-Quality Dataset & Model for Summarization and Paraphrasing
- high-quality model and dataset from a low-quality teacher model
Simple syntheticdata reduces sycophancy in large language models
- sycophancy = adapting views once a user reveals their views, to statements that are objectively incorrect
- lightweight finetuning step
GPT CanSolve Mathematical Problems Without a Calculator; with training data = multi-digit arithmetic
TeacherLM Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise
- anotating the dataset with "why" instead of only "what"
- Lema: LearningFrom Mistakes Makes LLM Better Reasoner
  - identify, explain, correct mistakes using the llm itself to fintune (learn from mistakes)
Ziya2 Data-centric Learning is All LLMs Need
- focus on pre-training techniques and data-centric optimization to enhance learning process