Running 103 103 TxT360: Trillion Extracted Text ๐ Create a large, deduplicated dataset for LLM pre-training
facebook/seamless-m4t-v2-large Automatic Speech Recognition โข Updated Jan 4, 2024 โข 77.8k โข โข 780