Spaces:
Running
Running
A newer version of the Streamlit SDK is available:
1.43.1
metadata
title: DataHubHub
emoji: ⚡
colorFrom: red
colorTo: indigo
sdk: streamlit
sdk_version: 1.42.2
app_file: app.py
pinned: false
license: apache-2.0
language: en
ML Dataset & Code Generation Manager
A comprehensive platform for ML dataset management and code generation with Hugging Face integration.
Features
- Dataset Management: Upload, explore, and manage machine learning datasets
- Data Visualization: Visualize dataset statistics and distributions
- Code Generation: Fine-tune models for code generation tasks
- Code Quality Tools: Improve code quality with integrated formatters, linters, and type checkers
Technology Stack
- Frontend: Streamlit
- Backend: Python
- Database: SQLite (via SQLAlchemy)
- ML Integration: Hugging Face Transformers, Datasets
- Visualization: Plotly, Matplotlib
Project Structure
.
├── app.py # Main application entry point
├── components/ # UI components
│ ├── code_quality.py # Code quality tools
│ ├── dataset_preview.py # Dataset preview component
│ ├── dataset_statistics.py # Dataset statistics component
│ ├── dataset_uploader.py # Dataset upload component
│ ├── dataset_validation.py # Dataset validation component
│ ├── dataset_visualization.py # Dataset visualization component
│ └── fine_tuning/ # Fine-tuning components
│ ├── finetune_ui.py # Fine-tuning UI
│ └── model_interface.py # Model interface
├── database/ # Database configuration
│ ├── models.py # Database models
│ └── operations.py # Database operations
├── utils/ # Utility functions
│ ├── dataset_utils.py # Dataset utilities
│ ├── huggingface_integration.py # Hugging Face integration
│ └── smolagents_integration.py # SmolaAgents integration
└── assets/ # Static assets
Deployment
This application is designed to be deployed as a Hugging Face Space.
Hugging Face Space Deployment
- Fork this repository
- Create a new Hugging Face Space
- Connect the forked repository to your Space
- The application will be deployed automatically
Local Development
- Clone the repository
- Install dependencies:
pip install streamlit pandas numpy plotly matplotlib scikit-learn SQLAlchemy huggingface-hub datasets transformers torch
- Run the application:
streamlit run app.py
Configuration
.streamlit/config.toml
: Streamlit configuration.streamlit/secrets.toml
: Secrets and API keyshuggingface-spacefile
: Hugging Face Space configuration
API Keys
To use the Hugging Face integration features, add your Hugging Face API token to .streamlit/secrets.toml
:
[huggingface]
hf_token = "HF_TOKEN"
License
This project is licensed under the Apache-2.0 License - see the LICENSE file for details.