Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
davanstrienΒ 
posted an update 4 days ago
Post
1483
Introducing FineWeb-C πŸŒπŸŽ“, a community-built dataset for improving language models in ALL languages.

Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.

318 annotators, 32K+ annotations, 12 languages - and growing! 🌍

data-is-better-together/fineweb-c
In this post