Daniel van Strien's picture

Daniel van Strien PRO

davanstrien

AI & ML interests

Machine Learning Librarian

Recent Activity

Organizations

Hugging Face's profile picture Notebooks-explorers's profile picture Living with Machines's profile picture BigScience Workshop's profile picture Spaces-explorers's profile picture BigScience Catalogue Data's profile picture Hacks/Hackers's profile picture flyswot's profile picture BigScience: LMs for Historical Texts's profile picture Cohere For AI's profile picture Webhooks Explorers (BETA)'s profile picture HuggingFaceM4's profile picture Open Access AI Collective's profile picture HF Canonical Model Maintainers's profile picture BigLAM: BigScience Libraries, Archives and Museums's profile picture Hugging Face OSS Metrics's profile picture ImageIN's profile picture Stable Diffusion Bias Eval's profile picture Librarian Bots's profile picture Blog-explorers's profile picture Hacktoberfest 2023's profile picture Hugging Face TB Research's profile picture geospatial's profile picture HPLT's profile picture HF-IA-archiving's profile picture 2A2I Legacy Models & Datasets's profile picture testy's profile picture DIBT-for-Klingon's profile picture Wikimedia Movement's profile picture DIBT-for-Esperanto's profile picture Journalists on Hugging Face's profile picture PleIAs's profile picture Argilla Explorers's profile picture Persian AI Community's profile picture HuggingFaceFW's profile picture Data Is Better Together's profile picture Social Post Explorers's profile picture OMOTO AI's profile picture academic-datasets's profile picture HuggingFaceFW-Dev's profile picture Hugging Face Discord Community's profile picture UCSF-JHU Opioid Industry Documents Archive's profile picture Dataset Tools's profile picture PDFPages's profile picture dibt-private's profile picture Data Is Better Together Contributor's profile picture Bluesky Community's profile picture Open R1's profile picture

davanstrien's activity

posted an update about 1 hour ago
view post
Post
20
📊 Introducing "Hugging Face Dataset Spotlight" 📊

I'm excited to share the first episode of our AI-generated podcast series focusing on nice datasets from the Hugging Face Hub!

This first episode explores mathematical reasoning datasets:

- SynthLabsAI/Big-Math-RL-Verified: Over 250,000 rigorously verified problems spanning multiple difficulty levels and mathematical domains
- open-r1/OpenR1-Math-220k: 220,000 math problems with multiple reasoning traces, verified for accuracy using Math Verify and Llama-3.3-70B models.
- facebook/natural_reasoning: 1.1 million general reasoning questions carefully deduplicated and decontaminated from existing benchmarks, showing superior scaling effects when training models like Llama3.1-8B-Instruct.

Plus a bonus segment on bespokelabs/bespoke-manim!

https://www.youtube.com/watch?v=-TgmRq45tW4
reacted to stefan-it's post with 🔥 about 2 hours ago
view post
Post
708
After running some 3DMark and FurMark benchmarks on Windows to make sure that my new 5090 is not causing melting cables [1] and some nice shots with a thermal camera (I don't think that's too much), running some fine-tuning experiments with my favorite Flair & Transformers libraries are very easy to perform.

Important steps:

Good idea is to start with a fresh Ubuntu 24.04 installation with latest CUDA 12.8 and the open NVIDIA driver - follow more advices from [2]:

sudo apt -y install cuda-toolkit-12-8 nvidia-open

I tried update from an existing Ubuntu installation with an older CUDA and driver version and it resulted in a non-startable system.

If you are using PyTorch 2.6 with built CUDA 12.6 it will result in:

NVIDIA Graphics Device with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.

But no worries! For PyTorch you need just to use a nightly 2.7 version that was built with CUDA 12.8. This can easily done via:

pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128

After that the latest Flair version can be installed and fine-tuning will work!

References:

[1]: https://www.reddit.com/r/nvidia/comments/1inpox7/rtx_50_series_12vhpwr_megathread/
[2]: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_network
posted an update about 21 hours ago
view post
Post
1461
Quick POC: Turn a Hugging Face dataset card into a short podcast introducing the dataset using all open models.

I think I'm the only weirdo who would enjoy listening to something like this though 😅

Here is an example for eth-nlped/stepverify
  • 1 reply
·
New activity in bespokelabs/bespoke-manim 1 day ago

add curator tag

#2 opened 1 day ago by
davanstrien
New activity in pimpalgaonkar/poems_test 1 day ago

add curator tag

#2 opened 1 day ago by
davanstrien
New activity in pimpalgaonkar/poems_test_2 1 day ago

add curator tag

#1 opened 1 day ago by
davanstrien