Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference Paper β’ 2412.13663 β’ Published 7 days ago β’ 103
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling β’ 3 items β’ Updated 6 days ago β’ 89
The Open Source Advantage in Large Language Models (LLMs) Paper β’ 2412.12004 β’ Published 9 days ago β’ 9
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding Paper β’ 2412.09604 β’ Published 12 days ago β’ 35
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper β’ 2412.10360 β’ Published 11 days ago β’ 131
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions Paper β’ 2412.08737 β’ Published 13 days ago β’ 51
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions Paper β’ 2412.09596 β’ Published 12 days ago β’ 90
POINTS1.5: Building a Vision-Language Model towards Real World Applications Paper β’ 2412.08443 β’ Published 14 days ago β’ 38
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Paper β’ 2412.08580 β’ Published 13 days ago β’ 44
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints Paper β’ 2412.07760 β’ Published 14 days ago β’ 49
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper β’ 2412.07589 β’ Published 15 days ago β’ 45
Evaluating and Aligning CodeLLMs on Human Preference Paper β’ 2412.05210 β’ Published 18 days ago β’ 47
STIV: Scalable Text and Image Conditioned Video Generation Paper β’ 2412.07730 β’ Published 14 days ago β’ 69
Training Large Language Models to Reason in a Continuous Latent Space Paper β’ 2412.06769 β’ Published 15 days ago β’ 61
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper β’ 2412.06559 β’ Published 16 days ago β’ 68
Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation Paper β’ 2412.06531 β’ Published 16 days ago β’ 71
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper β’ 2412.05237 β’ Published 18 days ago β’ 45
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases Paper β’ 2412.04862 β’ Published 19 days ago β’ 48