FineWeb2 Edu Japanese: A high-quality, filtered Japanese dataset (120M texts, 89.3B tokens) for educational AI training.
Yuichi Tateno PRO
hotchpotch
AI & ML interests
IR, Kaggle(competitions master)
Recent Activity
liked
a model
about 9 hours ago
EQUES/TinyDeepSeek-JP-1.5B
liked
a model
1 day ago
mmnga/deepseek-r1-distill-qwen2.5-bakeneko-32b-gguf
liked
a model
1 day ago
mmnga/qwen2.5-bakeneko-32b-instruct-gguf
Organizations
Collections
3
spaces
4
models
32

hotchpotch/fineweb-2-edu-japanese-classifier
Updated
•
103

hotchpotch/fineweb-2-japanese-text-cleaner
Updated
•
66

hotchpotch/static-embedding-japanese
Sentence Similarity
•
Updated
•
19

hotchpotch/tmp-exp034-128
Updated

hotchpotch/xlm-roberta-japanese-tokenizer-16k
Updated

hotchpotch/xlm-roberta-japanese-tokenizer-12k
Updated

hotchpotch/xlm-roberta-japanese-tokenizer-24k
Updated

hotchpotch/xlm-roberta-japanese-tokenizer
Updated
•
1

hotchpotch/japanese-splade-v2
Updated
•
265
•
11

hotchpotch/japanese-splade-base-v1_5
Updated
•
7
datasets
19
hotchpotch/fineweb-2-edu-japanese
Viewer
•
Updated
•
262M
•
1.58k
•
5
hotchpotch/fineweb-2-edu-japanese-noise-detect-raw
Viewer
•
Updated
•
64.2M
•
332
hotchpotch/fineweb-2-japanese-noise-spans
Viewer
•
Updated
•
344k
•
72
hotchpotch/fineweb-2-edu-japanese-scores
Viewer
•
Updated
•
313k
•
115
•
1
hotchpotch/sentence_transformer_japanese
Viewer
•
Updated
•
13.2M
•
1.49k
•
5
hotchpotch/JQaRA
Viewer
•
Updated
•
278k
•
911
•
19
hotchpotch/JaCWIR
Viewer
•
Updated
•
518k
•
166
•
6
hotchpotch/japanese-splade-v1-hard-negatives
Viewer
•
Updated
•
30.8M
•
1.08k
hotchpotch/mmarco-hard-negatives-reranker-score
Viewer
•
Updated
•
7.04M
•
999
hotchpotch/msmarco-ja-hard-negatives
Viewer
•
Updated
•
9.34M
•
298
•
2