Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published 5 days ago • 29
DateLogicQA: Benchmarking Temporal Biases in Large Language Models Paper • 2412.13377 • Published 7 days ago • 2
DateLogicQA: Benchmarking Temporal Biases in Large Language Models Paper • 2412.13377 • Published 7 days ago • 2 • 2
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published 16 days ago • 68
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models Paper • 2411.14982 • Published Nov 22 • 15
Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks Paper • 2411.01192 • Published Nov 2 • 3
Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks Paper • 2411.01192 • Published Nov 2 • 3
Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks Paper • 2411.01192 • Published Nov 2 • 3 • 2