--- title: DataHubHub emoji: ⚡ colorFrom: red colorTo: indigo sdk: streamlit sdk_version: 1.42.2 app_file: app.py pinned: false license: apache-2.0 language: en --- # ML Dataset & Code Generation Manager A comprehensive platform for ML dataset management and code generation with Hugging Face integration. ## Features - **Dataset Management**: Upload, explore, and manage machine learning datasets - **Data Visualization**: Visualize dataset statistics and distributions - **Code Generation**: Fine-tune models for code generation tasks - **Code Quality Tools**: Improve code quality with integrated formatters, linters, and type checkers ## Technology Stack - **Frontend**: Streamlit - **Backend**: Python - **Database**: SQLite (via SQLAlchemy) - **ML Integration**: Hugging Face Transformers, Datasets - **Visualization**: Plotly, Matplotlib ## Project Structure ``` . ├── app.py # Main application entry point ├── components/ # UI components │ ├── code_quality.py # Code quality tools │ ├── dataset_preview.py # Dataset preview component │ ├── dataset_statistics.py # Dataset statistics component │ ├── dataset_uploader.py # Dataset upload component │ ├── dataset_validation.py # Dataset validation component │ ├── dataset_visualization.py # Dataset visualization component │ └── fine_tuning/ # Fine-tuning components │ ├── finetune_ui.py # Fine-tuning UI │ └── model_interface.py # Model interface ├── database/ # Database configuration │ ├── models.py # Database models │ └── operations.py # Database operations ├── utils/ # Utility functions │ ├── dataset_utils.py # Dataset utilities │ ├── huggingface_integration.py # Hugging Face integration │ └── smolagents_integration.py # SmolaAgents integration └── assets/ # Static assets ``` ## Deployment This application is designed to be deployed as a Hugging Face Space. ### Hugging Face Space Deployment 1. Fork this repository 2. Create a new Hugging Face Space 3. Connect the forked repository to your Space 4. The application will be deployed automatically ### Local Development 1. Clone the repository 2. Install dependencies: ``` pip install streamlit pandas numpy plotly matplotlib scikit-learn SQLAlchemy huggingface-hub datasets transformers torch ``` 3. Run the application: ``` streamlit run app.py ``` ## Configuration - `.streamlit/config.toml`: Streamlit configuration - `.streamlit/secrets.toml`: Secrets and API keys - `huggingface-spacefile`: Hugging Face Space configuration ## API Keys To use the Hugging Face integration features, add your Hugging Face API token to `.streamlit/secrets.toml`: ```toml [huggingface] hf_token = "HF_TOKEN" ``` ## License This project is licensed under the Apache-2.0 License - see the LICENSE file for details.