Context Length - a DrishtiSharma Collection

DrishtiSharma 's Collections

Whisper Fine-tuning Event: Winning Models

Comparative Study:OPT-350M and GPT-2 w Reward-based Training

Adapter

Impact of Weight Decay on MBart-large-50 for EN-ES

Studying Impact of Batch Size and Mixed Precision

Studying Impact of lora_alpha on Llama-2 Quantized with GPTQ

Llama-2-7b on Databricks-Dolly-15k BigBench Hard Evaluation

Multifaceted Attention Analysis in Llama2 w En & Hi Dolly15k

Context Length

updated May 6

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12 • 62
Ring Attention with Blockwise Transformers for Near-Infinite Context

Paper • 2310.01889 • Published Oct 3, 2023 • 9
World Model on Million-Length Video And Language With RingAttention

Paper • 2402.08268 • Published Feb 13 • 36
Scaling Transformer to 1M tokens and beyond with RMT

Paper • 2304.11062 • Published Apr 19, 2023 • 2
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 103
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21 • 110
Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25 • 52
Long-context LLMs Struggle with Long In-context Learning

Paper • 2404.02060 • Published Apr 2 • 34
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9 • 33
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

Paper • 2401.03462 • Published Jan 7 • 26
Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization

Paper • 2401.07793 • Published Jan 15 • 3
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Paper • 2401.18079 • Published Jan 31 • 6
LongNet: Scaling Transformers to 1,000,000,000 Tokens

Paper • 2307.02486 • Published Jul 5, 2023 • 80
YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 65
Compressed Context Memory For Online Language Model Interaction

Paper • 2312.03414 • Published Dec 6, 2023
Extending LLMs' Context Window with 100 Samples

Paper • 2401.07004 • Published Jan 13 • 14