Post
20
π Introducing "Hugging Face Dataset Spotlight" π
I'm excited to share the first episode of our AI-generated podcast series focusing on nice datasets from the Hugging Face Hub!
This first episode explores mathematical reasoning datasets:
- SynthLabsAI/Big-Math-RL-Verified: Over 250,000 rigorously verified problems spanning multiple difficulty levels and mathematical domains
- open-r1/OpenR1-Math-220k: 220,000 math problems with multiple reasoning traces, verified for accuracy using Math Verify and Llama-3.3-70B models.
- facebook/natural_reasoning: 1.1 million general reasoning questions carefully deduplicated and decontaminated from existing benchmarks, showing superior scaling effects when training models like Llama3.1-8B-Instruct.
Plus a bonus segment on bespokelabs/bespoke-manim!
https://www.youtube.com/watch?v=-TgmRq45tW4
I'm excited to share the first episode of our AI-generated podcast series focusing on nice datasets from the Hugging Face Hub!
This first episode explores mathematical reasoning datasets:
- SynthLabsAI/Big-Math-RL-Verified: Over 250,000 rigorously verified problems spanning multiple difficulty levels and mathematical domains
- open-r1/OpenR1-Math-220k: 220,000 math problems with multiple reasoning traces, verified for accuracy using Math Verify and Llama-3.3-70B models.
- facebook/natural_reasoning: 1.1 million general reasoning questions carefully deduplicated and decontaminated from existing benchmarks, showing superior scaling effects when training models like Llama3.1-8B-Instruct.
Plus a bonus segment on bespokelabs/bespoke-manim!
https://www.youtube.com/watch?v=-TgmRq45tW4