Collections
Discover the best community collections!
Collections including paper arxiv:2412.19437
-
Video Creation by Demonstration
Paper • 2412.09551 • Published • 8 -
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation
Paper • 2412.07589 • Published • 45 -
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Paper • 2412.06531 • Published • 71 -
APOLLO: SGD-like Memory, AdamW-level Performance
Paper • 2412.05270 • Published • 38
-
Phi-4 Technical Report
Paper • 2412.08905 • Published • 101 -
Evaluating and Aligning CodeLLMs on Human Preference
Paper • 2412.05210 • Published • 47 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 46 -
Yi-Lightning Technical Report
Paper • 2412.01253 • Published • 25
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 41 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 44 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 48 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 77
-
Qwen2.5-Coder Technical Report
Paper • 2409.12186 • Published • 140 -
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement
Paper • 2409.12122 • Published • 3 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 14 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 77
-
Beyond Language Models: Byte Models are Digital World Simulators
Paper • 2402.19155 • Published • 49 -
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Paper • 2402.19427 • Published • 52 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 44 -
Resonance RoPE: Improving Context Length Generalization of Large Language Models
Paper • 2403.00071 • Published • 22