-
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Paper ā¢ 2206.10789 ā¢ Published ā¢ 4 -
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper ā¢ 2401.00448 ā¢ Published ā¢ 28 -
Training Compute-Optimal Large Language Models
Paper ā¢ 2203.15556 ā¢ Published ā¢ 10 -
Scaling Laws for Neural Language Models
Paper ā¢ 2001.08361 ā¢ Published ā¢ 6
Collections
Discover the best community collections!
Collections including paper arxiv:2401.00448
-
MambaByte: Token-free Selective State Space Model
Paper ā¢ 2401.13660 ā¢ Published ā¢ 50 -
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Paper ā¢ 2401.10774 ā¢ Published ā¢ 53 -
Self-Rewarding Language Models
Paper ā¢ 2401.10020 ā¢ Published ā¢ 143 -
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding
Paper ā¢ 2401.12954 ā¢ Published ā¢ 28
-
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper ā¢ 2401.00448 ā¢ Published ā¢ 28 -
Improving Text Embeddings with Large Language Models
Paper ā¢ 2401.00368 ā¢ Published ā¢ 79 -
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models
Paper ā¢ 2401.06951 ā¢ Published ā¢ 24 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper ā¢ 2403.17887 ā¢ Published ā¢ 78
-
Orca 2: Teaching Small Language Models How to Reason
Paper ā¢ 2311.11045 ā¢ Published ā¢ 70 -
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Paper ā¢ 2311.10775 ā¢ Published ā¢ 7 -
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Paper ā¢ 2311.11077 ā¢ Published ā¢ 24 -
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning
Paper ā¢ 2311.11501 ā¢ Published ā¢ 33
-
CodeFusion: A Pre-trained Diffusion Model for Code Generation
Paper ā¢ 2310.17680 ā¢ Published ā¢ 69 -
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning
Paper ā¢ 2312.15685 ā¢ Published ā¢ 17 -
LLaMA Beyond English: An Empirical Study on Language Capability Transfer
Paper ā¢ 2401.01055 ā¢ Published ā¢ 54 -
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Paper ā¢ 2401.00448 ā¢ Published ā¢ 28
-
Attention Is All You Need
Paper ā¢ 1706.03762 ā¢ Published ā¢ 44 -
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Paper ā¢ 2307.08691 ā¢ Published ā¢ 8 -
Mixtral of Experts
Paper ā¢ 2401.04088 ā¢ Published ā¢ 157 -
Mistral 7B
Paper ā¢ 2310.06825 ā¢ Published ā¢ 47
-
S^{3}: Increasing GPU Utilization during Generative Inference for Higher Throughput
Paper ā¢ 2306.06000 ā¢ Published ā¢ 1 -
Fast Distributed Inference Serving for Large Language Models
Paper ā¢ 2305.05920 ā¢ Published ā¢ 1 -
Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline
Paper ā¢ 2305.13144 ā¢ Published ā¢ 1 -
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference
Paper ā¢ 2303.06182 ā¢ Published ā¢ 1
-
FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning
Paper ā¢ 2309.04663 ā¢ Published ā¢ 5 -
Textbooks Are All You Need II: phi-1.5 technical report
Paper ā¢ 2309.05463 ā¢ Published ā¢ 87 -
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
Paper ā¢ 2310.08541 ā¢ Published ā¢ 17 -
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models
Paper ā¢ 2310.13671 ā¢ Published ā¢ 18