Open LLM Leaderboard

Enterprise

community

https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard

Activity Feed

AI & ML interests

Evaluating open LLMs

Recent Activity

open-llm-bot updated a dataset less than a minute ago

open-llm-leaderboard/Nexesenex__Llama_3.1_8b_Stormeder_v1.04-details

open-llm-bot updated a dataset less than a minute ago

open-llm-leaderboard/Nexesenex__Llama_3.1_8b_Hermedash_R1_V1.04-details

open-llm-bot published a dataset 1 minute ago

open-llm-leaderboard/Nexesenex__Llama_3.1_8b_Hermedash_R1_V1.04-details

View all activity

open-llm-leaderboard's activity

open-llm-bot

updated 2 datasets less than a minute ago

open-llm-leaderboard/Nexesenex__Llama_3.1_8b_Stormeder_v1.04-details

Updated less than a minute ago

open-llm-leaderboard/Nexesenex__Llama_3.1_8b_Hermedash_R1_V1.04-details

Updated less than a minute ago

open-llm-bot

published a dataset 1 minute ago

open-llm-leaderboard/Nexesenex__Llama_3.1_8b_Hermedash_R1_V1.04-details

Updated less than a minute ago

open-llm-bot

updated a dataset 1 minute ago

open-llm-leaderboard/results

Updated 1 minute ago • 86.9k • 9

open-llm-bot

published a dataset 2 minutes ago

open-llm-leaderboard/Nexesenex__Llama_3.1_8b_Stormeder_v1.04-details

Updated less than a minute ago

open-llm-bot

updated a dataset 30 minutes ago

open-llm-leaderboard/requests

Preview • Updated 30 minutes ago • 428k • 9

open-llm-bot

updated a dataset 31 minutes ago

open-llm-leaderboard/Nexesenex__Llama_3.1_8b_Typhoon_v1.03-details

Viewer • Updated 31 minutes ago • 39.3k

AdinaY

posted an update about 3 hours ago

Post

102

The AI race in the automotive industry is heating up🚗
Li Auto’s research team has released their latest paper on LLM👇 LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation (2502.18302)

✨This paper introduces LDGen, which integrates LLMs with diffusion models to enhance text-to-image (T2I) generation capabilities.

AdinaY

posted an update about 3 hours ago

Post

LLaDA 🔥a 8B diffusion model by GSAI Lab Renmin University
✨Fully trained from scratch, LLaDA delivers performance on par with LLaMA3 8B
Model: GSAI-ML/LLaDA-8B-Instruct
Demo: multimodalart/LLaDA
Paper: Large Language Diffusion Models (2502.09992)

freddyaboulton

posted an update 3 days ago

Post

2819

Getting WebRTC and Websockets right in python is very tricky. If you've tried to wrap an LLM in a real-time audio layer then you know what I'm talking about.

That's where FastRTC comes in! It makes WebRTC and Websocket streams super easy with minimal code and overhead.

Check out our org: hf.co/fastrtc

AdinaY

posted an update 3 days ago

Post

2574

Wan2.1 🔥📹 new OPEN video model by Alibaba Wan team!

Model: Wan-AI/Wan2.1-T2V-14B
Demo: Wan-AI/Wan2.1

✨Apache 2.0
✨8.19GB VRAM, runs on most GPUs
✨Multi-Tasking: T2V, I2V, Video Editing, T2I, V2A
✨Text Generation: Supports Chinese & English
✨Powerful Video VAE: Encode/decode 1080P w/ temporal precision

1 reply

AdinaY

posted an update 4 days ago

Post

2822

Try QwQ-Max-Preview, Qwen's reasoning model here👉 https://chat.qwen.ai
Can't wait for the model weights to drop on the Hugging Face Hub 🔥

2 replies

AdinaY

posted an update 4 days ago

Post

2395

Two AI startups, DeepSeek & Moonshot AI , keep moving in perfect sync 👇

✨ Last December: DeepSeek & Moonshot AI released their reasoning models on the SAME DAY.
DeepSeek: deepseek-ai/DeepSeek-R1
MoonShot: https://github.com/MoonshotAI/Kimi-k1.5

✨ Last week: Both teams published papers on modifying attention mechanisms on the SAME DAY AGAIN.
DeepSeek: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (2502.11089)
Moonshot: MoBA: Mixture of Block Attention for Long-Context LLMs (2502.13189)

✨ TODAY:
DeepSeek unveiled Flash MLA: a efficient MLA decoding kernel for NVIDIA Hopper GPUs, optimized for variable-length sequences.
https://github.com/deepseek-ai/FlashMLA

Moonshot AI introduces Moonlight: a 3B/16B MoE trained on 5.7T tokens using Muon, pushing the Pareto frontier with fewer FLOPs.
moonshotai/Moonlight-16B-A3B

What's next? 👀

AdinaY

posted an update 8 days ago

Post

749

VLM-R1🔥bringing DeepSeek’s R1 method to vision language models!

GitHub: https://github.com/om-ai-lab/VLM-R1
Demo: omlab/VLM-R1-Referral-Expression

AdinaY

posted an update 10 days ago

Post

4179

🚀 StepFun阶跃星辰 is making BIG open moves!

Last year, their GOT-OCR 2.0 took the community by storm 🔥but many didn’t know they were also building some amazing models. Now, they’ve just dropped something huge on the hub!

📺 Step-Video-T2V: a 30B bilingual open video model that generates 204 frames (8-10s) at 540P resolution with high information density & consistency.
stepfun-ai/stepvideo-t2v

🔊 Step-Audio-TTS-3B : a TTS trained with the LLM-Chat paradigm on a large synthetic dataset, capable of generating RAP & Humming
stepfun-ai/step-audio-67b33accf45735bb21131b0b

3 replies

AdinaY

posted an update 10 days ago

Post

2420

The latest paper of DeepSeek is now available on the Daily Papers page 🚀
You can reach out to the authors directly on this page👇
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (2502.11089)

1 reply

AdinaY

posted an update 15 days ago

Post

2553

Ovis2 🔥 a multimodal LLM released by Alibaba AIDC team.
AIDC-AI/ovis2-67ab36c7e497429034874464
✨1B/2B/4B/8B/16B/34B
✨Strong CoT for deeper problem solving
✨Multilingual OCR – Expanded beyond English & Chinese, with better data extraction

AdinaY

posted an update 15 days ago

Post

3547

InspireMusic 🎵🔥 an open music generation framework by Alibaba FunAudio Lab
Model: FunAudioLLM/InspireMusic-1.5B-Long
Demo: FunAudioLLM/InspireMusic
✨ Music, songs, audio - ALL IN ONE
✨ High quality audio: 24kHz & 48kHz sampling rates
✨ Long-Form Generation: enables extended audio creation
✨ Efficient Fine-Tuning: precision (BF16, FP16, FP32) with user-friendly scripts

1 reply

lewtun

posted an update 18 days ago

Post

4598

Introducing OpenR1-Math-220k!

open-r1/OpenR1-Math-220k

The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch 💪

What’s new compared to existing reasoning datasets?

♾ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.

🐳 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.

📀 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.

⏳ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that can’t be verified with a rules-based parser)

📊 We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.

🔎 Read our blog post for all the nitty gritty details: https://huggingface.co./blog/open-r1/update-2

hynky

authored a paper 22 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 24 days ago • 195

AI & ML interests

Recent Activity

Team members 18

open-llm-leaderboard's activity