--- title: DishDecode emoji: ⚡ colorFrom: pink colorTo: pink sdk: docker pinned: false license: mit short_description: It transforms unstructured recipe videos into structured --- --- # 📝 **Flask Audio and YouTube Video Processing API** A Flask-based API that processes audio files and YouTube video transcripts to generate structured recipe information. It utilizes Whisper, Deepgram, and Gemini APIs for transcription and data extraction. ## 🚀 **Overview** This API offers: 1. **Audio Processing**: Download and transcribe audio files. 2. **YouTube Transcription**: Extract transcripts from YouTube videos. 3. **Recipe Data Generation**: Generate detailed recipe data using Gemini API. --- ## 🧩 **Features** - **Audio URL Processing**: Download and transcribe audio via Deepgram. - **YouTube Video Processing**: Extract video transcripts and process them. - **Structured Output**: Recipe name, ingredients, steps, techniques, and more. - **Logging and Error Handling**: Debugging and comprehensive error responses. --- ## ⚙️ **Installation** ### 1. Clone the Repository ```bash git clone https://github.com/your-repo-name.git cd your-repo-name ``` ### 2. Create a Virtual Environment ```bash python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate ``` ### 3. Install Dependencies ```bash pip install -r requirements.txt ``` ### 4. Configure Environment Variables Create a `.env` file with your API keys: ```plaintext FIRST_API_KEY=your_gemini_api_key SECOND_API_KEY=your_deepgram_api_key ``` --- ## 📡 **API Endpoints** ### ✅ Health Check **GET /** ```json { "status": "success", "message": "API is running successfully!" } ``` ### 🎧 Process Audio URL **POST /process-audio** - **Request**: ```json { "audioUrl": "https://example.com/audio.wav" } ``` - **Response**: ```json { "structured_data": { "Recipe Name": "Pasta Alfredo", "Ingredients List": ["Pasta", "Cream", "Garlic"], ... } } ``` ### 📹 Process YouTube Video **POST /process-youtube** - **Request**: ```json { "youtube_url": "https://www.youtube.com/watch?v=example" } ``` - **Response**: ```json { "structured_data": { "Recipe Name": "Grilled Cheese Sandwich", "Ingredients List": ["Bread", "Cheese", "Butter"], ... } } ``` --- ## 🛠️ **How It Works** 1. **Audio Processing**: - Downloads the audio file. - Transcribes using Deepgram. - Sends the transcription to Gemini for structured data. 2. **YouTube Processing**: - Extracts the video ID. - Retrieves the transcript. - Sends the transcript to Gemini for structured data. --- ## 📦 **Dependencies** - **Flask** - **Whisper** - **Deepgram** - **Google Gemini API** - **YouTube Transcript API** - **Requests** - **Dotenv** Install dependencies via: ```bash pip install -r requirements.txt ``` --- ## ▶️ **Run the Application** 1. **Activate Virtual Environment**: ```bash source venv/bin/activate # Windows: venv\Scripts\activate ``` 2. **Run the Flask App**: ```bash python app.py ``` 3. **Access**: ``` http://localhost:5000 ``` --- ## 🐞 **Error Handling** - **API Key Errors**: Ensure `.env` contains valid API keys. - **Invalid Input**: Returns 400 for missing URLs. - **Transcription Errors**: Returns detailed error messages. --- ## 📝 **License** MIT License. --- ## 👤 **Contributors** - **Aniket** --- 💬 **Feedback or contributions?** Open an issue or submit a pull request! Check out the configuration reference at https://huggingface.co./docs/hub/spaces-config-reference