Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers Paper β’ 2408.06195 β’ Published Aug 12, 2024 β’ 70
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper β’ 2402.17764 β’ Published Feb 27, 2024 β’ 610
Orca-Math: Unlocking the potential of SLMs in Grade School Math Paper β’ 2402.14830 β’ Published Feb 16, 2024 β’ 25
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition Paper β’ 2402.15504 β’ Published Feb 23, 2024 β’ 22
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement Paper β’ 2402.14658 β’ Published Feb 22, 2024 β’ 82
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback Paper β’ 2402.01391 β’ Published Feb 2, 2024 β’ 42
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research Paper β’ 2402.00159 β’ Published Jan 31, 2024 β’ 62
OLMo: Accelerating the Science of Language Models Paper β’ 2402.00838 β’ Published Feb 1, 2024 β’ 83
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens Paper β’ 2401.17377 β’ Published Jan 30, 2024 β’ 36
SliceGPT: Compress Large Language Models by Deleting Rows and Columns Paper β’ 2401.15024 β’ Published Jan 26, 2024 β’ 72
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities Paper β’ 2401.15071 β’ Published Jan 26, 2024 β’ 37
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI Paper β’ 2401.14019 β’ Published Jan 25, 2024 β’ 23
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence Paper β’ 2401.14196 β’ Published Jan 25, 2024 β’ 60
Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All Paper β’ 2401.13795 β’ Published Jan 24, 2024 β’ 68
Rethinking Patch Dependence for Masked Autoencoders Paper β’ 2401.14391 β’ Published Jan 25, 2024 β’ 25
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts Paper β’ 2401.04081 β’ Published Jan 8, 2024 β’ 70
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism Paper β’ 2401.02954 β’ Published Jan 5, 2024 β’ 45
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper β’ 2312.11514 β’ Published Dec 12, 2023 β’ 259