A Practitioner's Guide to Continual Multimodal Pretraining Paper • 2408.14471 • Published Aug 26, 2024
CiteME: Can Language Models Accurately Cite Scientific Claims? Paper • 2407.12861 • Published Jul 10, 2024
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation Paper • 2502.19414 • Published 2 days ago • 16
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs Paper • 2502.19413 • Published 2 days ago • 14
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence Paper • 2502.09927 • Published 15 days ago
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs Paper • 2502.12982 • Published 10 days ago • 13
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper • 2502.09604 • Published 15 days ago • 32
Great Models Think Alike and this Undermines AI Oversight Paper • 2502.04313 • Published 22 days ago • 30
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 24 days ago • 195
Optimizing Large Language Model Training Using FP4 Quantization Paper • 2501.17116 • Published Jan 28 • 36
BlockPruner: Fine-grained Pruning for Large Language Models Paper • 2406.10594 • Published Jun 15, 2024
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published Jan 23 • 44
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback Paper • 2501.10799 • Published Jan 18 • 15