instruction-pretrain/instruction-synthesizer Text Generation • Updated about 1 hour ago • 495 • 77
Running 103 103 TxT360: Trillion Extracted Text 📖 Create a large, deduplicated dataset for LLM pre-training