Enabling Scalable Oversight via Self-Evolving Critic Paper • 2501.05727 • Published about 1 month ago • 70
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination Paper • 2411.03823 • Published Nov 6, 2024 • 45
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models Paper • 2410.14059 • Published Oct 17, 2024 • 57
Roadmap towards Superhuman Speech Understanding using Large Language Models Paper • 2410.13268 • Published Oct 17, 2024 • 33