Hub documentation

Downloading datasets

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Downloading datasets

Integrated libraries

If a dataset on the Hub is tied to a supported library, loading the dataset can be done in just a few lines. For information on accessing the dataset, you can click on the “Use in dataset library” button on the dataset page to see how to do so. For example, samsum shows how to do so with 🤗 Datasets below.

Using the Hugging Face Client Library

You can use the huggingface_hub library to create, delete, update and retrieve information from repos. You can also download files from repos or integrate them into your library! For example, you can quickly load a CSV dataset with a few lines using Pandas.

from huggingface_hub import hf_hub_download
import pandas as pd

REPO_ID = "YOUR_REPO_ID"
FILENAME = "data.csv"

dataset = pd.read_csv(
    hf_hub_download(repo_id=REPO_ID, filename=FILENAME, repo_type="dataset")
)

Using Git

Since all datasets on the Hub are Git repositories, you can clone the datasets locally by running:

git lfs install
git clone [email protected]:datasets/<dataset ID> # example: git clone [email protected]:datasets/allenai/c4

If you have write-access to the particular dataset repo, you’ll also have the ability to commit and push revisions to the dataset.

Add your SSH public key to your user settings to push changes and/or access private repos.

< > Update on GitHub