Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
85
10
15
Guilherme Penedo
guipenedo
Follow
GBwin2077's profile picture
Naavik's profile picture
Teo2029's profile picture
755 followers
·
6 following
gui_penedo
guipenedo
AI & ML interests
None yet
Articles
FineWeb2-C: Help Build Better Language Models in Your Language
20 days ago
•
12
Organizations
guipenedo
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
New activity in
HuggingFaceFW/fineweb-2
3 days ago
Cannot load with datasets
3
#4 opened 3 days ago by
mbanon
New activity in
HuggingFaceFW/fineweb-edu
5 days ago
New update returns a 500 server error using the datasets-server API
3
#18 opened 15 days ago by
jonna32
A lot of load errors after new update
14
#19 opened 6 days ago by
yzhangcs
Add "date" column to "default" subset
#20 opened 5 days ago by
lhoestq
New activity in
HuggingFaceFW/fineweb
25 days ago
Simple exact deduplication removes 2/3 of data.
4
#49 opened 5 months ago by
egor-pakhomov
Torrent?
3
#4 opened 9 months ago by
emilss
Any plan to train models on larger subset of dataset?
1
#8 opened 9 months ago by
mrfakename
Are copyrighted works included in this dataset?
4
#9 opened 9 months ago by
umm-maybe
Reprocessing for a new language
14
#12 opened 9 months ago by
pere
Training configs for data ablation study
2
#14 opened 9 months ago by
jimmyhbx
tiny-fineweb
3
#19 opened 9 months ago by
3thn
Unsafe files
1
#25 opened 8 months ago by
alielfilali01
"Reproducing GPT-2 (124M) in llm.c in 90 minutes for $20" using fineweb by Karpathy
#28 opened 8 months ago by
clem
Regarding to the newly updated indexes(writen as deduplication issues)
5
#29 opened 7 months ago by
kimcando
Dedup
1
#32 opened 7 months ago by
shawnkx
Language subset
3
#33 opened 7 months ago by
talmor
How to compute the aggerate score?
1
#35 opened 7 months ago by
mornmirror
why do you apply "All filters except the (very destructive) terminal_punct"
3
#36 opened 7 months ago by
bpwl0121
Reproducibility of the work for other languages
3
#38 opened 7 months ago by
camillop
Fineweb train configuration
3
#39 opened 7 months ago by
nezhazheng
Load more