AI & ML interests

Evaluating open LLMs

Recent Activity

open-llm-leaderboard's activity

AdinaYΒ 
posted an update about 3 hours ago
AdinaYΒ 
posted an update about 3 hours ago
freddyaboultonΒ 
posted an update 3 days ago
view post
Post
2819
Getting WebRTC and Websockets right in python is very tricky. If you've tried to wrap an LLM in a real-time audio layer then you know what I'm talking about.

That's where FastRTC comes in! It makes WebRTC and Websocket streams super easy with minimal code and overhead.

Check out our org: hf.co/fastrtc
AdinaYΒ 
posted an update 3 days ago
view post
Post
2574
Wan2.1 πŸ”₯πŸ“Ή new OPEN video model by Alibaba Wan team!

Model: Wan-AI/Wan2.1-T2V-14B
Demo: Wan-AI/Wan2.1

✨Apache 2.0
✨8.19GB VRAM, runs on most GPUs
✨Multi-Tasking: T2V, I2V, Video Editing, T2I, V2A
✨Text Generation: Supports Chinese & English
✨Powerful Video VAE: Encode/decode 1080P w/ temporal precision
  • 1 reply
Β·
AdinaYΒ 
posted an update 4 days ago
view post
Post
2822
Try QwQ-Max-Preview, Qwen's reasoning model hereπŸ‘‰ https://chat.qwen.ai
Can't wait for the model weights to drop on the Hugging Face Hub πŸ”₯
  • 2 replies
Β·
AdinaYΒ 
posted an update 4 days ago
view post
Post
2395
Two AI startups, DeepSeek & Moonshot AI , keep moving in perfect sync πŸ‘‡

✨ Last December: DeepSeek & Moonshot AI released their reasoning models on the SAME DAY.
DeepSeek: deepseek-ai/DeepSeek-R1
MoonShot: https://github.com/MoonshotAI/Kimi-k1.5

✨ Last week: Both teams published papers on modifying attention mechanisms on the SAME DAY AGAIN.
DeepSeek: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (2502.11089)
Moonshot: MoBA: Mixture of Block Attention for Long-Context LLMs (2502.13189)

✨ TODAY:
DeepSeek unveiled Flash MLA: a efficient MLA decoding kernel for NVIDIA Hopper GPUs, optimized for variable-length sequences.
https://github.com/deepseek-ai/FlashMLA

Moonshot AI introduces Moonlight: a 3B/16B MoE trained on 5.7T tokens using Muon, pushing the Pareto frontier with fewer FLOPs.
moonshotai/Moonlight-16B-A3B

What's next? πŸ‘€
AdinaYΒ 
posted an update 8 days ago
AdinaYΒ 
posted an update 10 days ago
view post
Post
4179
πŸš€ StepFunι˜Άθ·ƒζ˜ŸθΎ° is making BIG open moves!

Last year, their GOT-OCR 2.0 took the community by storm πŸ”₯but many didn’t know they were also building some amazing models. Now, they’ve just dropped something huge on the hub!

πŸ“Ί Step-Video-T2V: a 30B bilingual open video model that generates 204 frames (8-10s) at 540P resolution with high information density & consistency.
stepfun-ai/stepvideo-t2v

πŸ”Š Step-Audio-TTS-3B : a TTS trained with the LLM-Chat paradigm on a large synthetic dataset, capable of generating RAP & Humming
stepfun-ai/step-audio-67b33accf45735bb21131b0b
Β·
AdinaYΒ 
posted an update 10 days ago
AdinaYΒ 
posted an update 15 days ago
view post
Post
2553
Ovis2 πŸ”₯ a multimodal LLM released by Alibaba AIDC team.
AIDC-AI/ovis2-67ab36c7e497429034874464
✨1B/2B/4B/8B/16B/34B
✨Strong CoT for deeper problem solving
✨Multilingual OCR – Expanded beyond English & Chinese, with better data extraction
AdinaYΒ 
posted an update 15 days ago
view post
Post
3547
InspireMusic 🎡πŸ”₯ an open music generation framework by Alibaba FunAudio Lab
Model: FunAudioLLM/InspireMusic-1.5B-Long
Demo: FunAudioLLM/InspireMusic
✨ Music, songs, audio - ALL IN ONE
✨ High quality audio: 24kHz & 48kHz sampling rates
✨ Long-Form Generation: enables extended audio creation
✨ Efficient Fine-Tuning: precision (BF16, FP16, FP32) with user-friendly scripts
  • 1 reply
Β·
lewtunΒ 
posted an update 18 days ago
view post
Post
4598
Introducing OpenR1-Math-220k!

open-r1/OpenR1-Math-220k

The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch πŸ’ͺ

What’s new compared to existing reasoning datasets?

β™Ύ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.

🐳 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.

πŸ“€ 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.

⏳ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that can’t be verified with a rules-based parser)

πŸ“Š We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.

πŸ”Ž Read our blog post for all the nitty gritty details: https://huggingface.co./blog/open-r1/update-2