Post
1499
Introducing FineWeb-C 🌐🎓, a community-built dataset for improving language models in ALL languages.
Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.
318 annotators, 32K+ annotations, 12 languages - and growing! 🌍
data-is-better-together/fineweb-c
Inspired by FineWeb-Edu the community is labelling the educational quality of texts for many languages.
318 annotators, 32K+ annotations, 12 languages - and growing! 🌍
data-is-better-together/fineweb-c