Using 🤗 Datasets
Once you’ve found an interesting dataset on the Hugging Face Hub, you can load the dataset using 🤗 Datasets. You can click on the Use in dataset library button to copy the code to load a dataset.
First you need to Login with your Hugging Face account, for example using:
huggingface-cli login
And then you can load a dataset from the Hugging Face Hub using
from datasets import load_dataset
dataset = load_dataset("username/my_dataset")
# or load the separate splits if the dataset has train/validation/test splits
train_dataset = load_dataset("username/my_dataset", split="train")
valid_dataset = load_dataset("username/my_dataset", split="validation")
test_dataset = load_dataset("username/my_dataset", split="test")
You can also upload datasets to the Hugging Face Hub:
my_new_dataset.push_to_hub("username/my_new_dataset")
This creates a dataset repository username/my_new_dataset
containing your Dataset in Parquet format, that you can reload later.
For more information about using 🤗 Datasets, check out the tutorials and how-to guides available in the 🤗 Datasets documentation.
< > Update on GitHub