--- title: PDF to Podcast Converter emoji: 🎙️ colorFrom: green colorTo: purple sdk: docker app_port: 7860 --- # NotebookMg This project converts PDF documents into engaging podcast conversations using AI. It leverages Google's Gemini Pro for text processing and ElevenLabs for voice synthesis. ## Features - PDF text extraction and cleaning - Conversion of academic/technical content into natural dialogue - Dynamic conversation generation between two hosts (Alex and Jamie) - High-quality text-to-speech synthesis - Web interface for easy interaction - API endpoints for programmatic access ## Prerequisites - Python 3.8+ - Google Gemini API key - ElevenLabs API key ## Installation 1. Clone the repository: ```bash git clone cd pdf-to-podcast ``` 2. Install dependencies: ```bash pip install -r requirements.txt ``` 3. Set up environment variables: ```bash # Create a .env file touch .env # Add your API keys echo "GEMINI_API_KEY=your_gemini_api_key" >> .env echo "ELEVEN_API_KEY=your_elevenlabs_api_key" >> .env ``` ## Project Structure ``` pdf-to-podcast/ ├── main.py # Core conversion logic ├── app.py # FastAPI application ├── run.py # Server startup script ├── templates/ # HTML templates │ └── index.html # Web interface ├── uploads/ # Temporary PDF storage └── outputs/ # Generated files ``` ## Usage ### Web Interface 1. Start the server: ```bash python run.py ``` 2. Open your browser and navigate to `http://localhost:8000` 3. Upload a PDF file 4. Download the generated files: - Cleaned text version - Conversation transcript - MP3 podcast file ### API Endpoints - `POST /upload-pdf/`: Upload PDF and generate podcast - `GET /download/{filename}`: Download generated files - `GET /status`: Check API status ## API Examples ```python import requests # Upload PDF with open('document.pdf', 'rb') as f: response = requests.post( 'http://localhost:8000/upload-pdf/', files={'file': f} ) # Download generated podcast response = requests.get( 'http://localhost:8000/download/document_podcast.mp3' ) ``` ## Configuration Voice IDs can be configured in `main.py`: ```python self.alex_voice_id = "21m00Tcm4TlvDq8ikWAM" # Rachel voice self.jamie_voice_id = "IKne3meq5aSn9XLyUdCD" # Adam voice ``` ## Dependencies - `google-generativeai`: Gemini Pro API - `elevenlabs`: Text-to-speech synthesis - `PyPDF2`: PDF processing - `fastapi`: Web API framework - `pydub`: Audio processing - `python-multipart`: File upload handling - `uvicorn`: ASGI server - `jinja2`: Template engine ## Contributing 1. Fork the repository 2. Create a feature branch 3. Commit your changes 4. Push to the branch 5. Create a Pull Request ## License This project is licensed under the MIT License - see the LICENSE file for details. ## Acknowledgments - Google Gemini for AI text processing - ElevenLabs for voice synthesis - FastAPI team for the excellent web framework ## Support For support, please open an issue in the GitHub repository or contact [your-email].