-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 143 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 11 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 50 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 44
Collections
Discover the best community collections!
Collections including paper arxiv:2404.07839
-
FlowMind: Automatic Workflow Generation with LLMs
Paper • 2404.13050 • Published • 32 -
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
Paper • 2404.13208 • Published • 38 -
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 41 -
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation
Paper • 2404.12753 • Published • 41
-
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Paper • 2404.08801 • Published • 63 -
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 41 -
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 31 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 138
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 80 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 60 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 29 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 56
-
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 41 -
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
Paper • 2307.05695 • Published • 22 -
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 84 -
Pre-training Small Base LMs with Fewer Tokens
Paper • 2404.08634 • Published • 34
-
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 41 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper • 2404.03715 • Published • 60 -
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
Paper • 2404.05674 • Published • 13 -
Agentless: Demystifying LLM-based Software Engineering Agents
Paper • 2407.01489 • Published • 42
-
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 31 -
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 138 -
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 41 -
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Paper • 2404.07143 • Published • 103
-
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Paper • 2404.07839 • Published • 41 -
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper • 2404.05892 • Published • 31 -
Jamba: A Hybrid Transformer-Mamba Language Model
Paper • 2403.19887 • Published • 104 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52