DataHubHub / README.md
whackthejacker's picture
Update README.md
0ecc4eb verified

A newer version of the Streamlit SDK is available: 1.43.1

Upgrade
metadata
title: DataHubHub
emoji: 
colorFrom: red
colorTo: indigo
sdk: streamlit
sdk_version: 1.42.2
app_file: app.py
pinned: false
license: apache-2.0
language: en

ML Dataset & Code Generation Manager

A comprehensive platform for ML dataset management and code generation with Hugging Face integration.

Features

  • Dataset Management: Upload, explore, and manage machine learning datasets
  • Data Visualization: Visualize dataset statistics and distributions
  • Code Generation: Fine-tune models for code generation tasks
  • Code Quality Tools: Improve code quality with integrated formatters, linters, and type checkers

Technology Stack

  • Frontend: Streamlit
  • Backend: Python
  • Database: SQLite (via SQLAlchemy)
  • ML Integration: Hugging Face Transformers, Datasets
  • Visualization: Plotly, Matplotlib

Project Structure

.
├── app.py                     # Main application entry point
├── components/                # UI components
│   ├── code_quality.py        # Code quality tools
│   ├── dataset_preview.py     # Dataset preview component
│   ├── dataset_statistics.py  # Dataset statistics component
│   ├── dataset_uploader.py    # Dataset upload component
│   ├── dataset_validation.py  # Dataset validation component
│   ├── dataset_visualization.py # Dataset visualization component
│   └── fine_tuning/           # Fine-tuning components
│       ├── finetune_ui.py     # Fine-tuning UI
│       └── model_interface.py # Model interface
├── database/                  # Database configuration
│   ├── models.py              # Database models
│   └── operations.py          # Database operations
├── utils/                     # Utility functions
│   ├── dataset_utils.py       # Dataset utilities
│   ├── huggingface_integration.py # Hugging Face integration
│   └── smolagents_integration.py # SmolaAgents integration
└── assets/                    # Static assets

Deployment

This application is designed to be deployed as a Hugging Face Space.

Hugging Face Space Deployment

  1. Fork this repository
  2. Create a new Hugging Face Space
  3. Connect the forked repository to your Space
  4. The application will be deployed automatically

Local Development

  1. Clone the repository
  2. Install dependencies:
    pip install streamlit pandas numpy plotly matplotlib scikit-learn SQLAlchemy huggingface-hub datasets transformers torch
    
  3. Run the application:
    streamlit run app.py
    

Configuration

  • .streamlit/config.toml: Streamlit configuration
  • .streamlit/secrets.toml: Secrets and API keys
  • huggingface-spacefile: Hugging Face Space configuration

API Keys

To use the Hugging Face integration features, add your Hugging Face API token to .streamlit/secrets.toml:

[huggingface]
hf_token = "HF_TOKEN"

License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.