Libraries
The Datasets Hub has support for several libraries in the Open Source ecosystem. Thanks to the huggingface_hub Python library, itβs easy to enable sharing your datasets on the Hub. Weβre happy to welcome to the Hub a set of Open Source libraries that are pushing Machine Learning forward.
The table below summarizes the supported libraries and their level of integration.
Library | Description | Download from Hub | Push to Hub |
---|---|---|---|
Argilla | Collaboration tool for AI engineers and domain experts that value high quality data. | β | β |
Dask | Parallel and distributed computing library that scales the existing Python and PyData ecosystem. | β | β |
Datasets | π€ Datasets is a library for accessing and sharing datasets for Audio, Computer Vision, and Natural Language Processing (NLP). | β | β |
Distilabel | The framework for synthetic data generation and AI feedback. | β | β |
DuckDB | In-process SQL OLAP database management system. | β | β |
FiftyOne | FiftyOne is a library for curation and visualization of image, video, and 3D data. | β | β |
Pandas | Python data analysis toolkit. | β | β |
Polars | A DataFrame library on top of an OLAP query engine. | β | β |
Spark | Real-time, large-scale data processing tool in a distributed environment. | β | β |
WebDataset | Library to write I/O pipelines for large datasets. | β | β |