Spaces:
Sleeping
Sleeping
File size: 3,142 Bytes
363f523 e56e019 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
---
title: PDF to Podcast Converter
emoji: ποΈ
colorFrom: green
colorTo: purple
sdk: docker
app_port: 7860
---
# NotebookMg
This project converts PDF documents into engaging podcast conversations using AI. It leverages Google's Gemini Pro for text processing and ElevenLabs for voice synthesis.
## Features
- PDF text extraction and cleaning
- Conversion of academic/technical content into natural dialogue
- Dynamic conversation generation between two hosts (Alex and Jamie)
- High-quality text-to-speech synthesis
- Web interface for easy interaction
- API endpoints for programmatic access
## Prerequisites
- Python 3.8+
- Google Gemini API key
- ElevenLabs API key
## Installation
1. Clone the repository:
```bash
git clone <repository-url>
cd pdf-to-podcast
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Set up environment variables:
```bash
# Create a .env file
touch .env
# Add your API keys
echo "GEMINI_API_KEY=your_gemini_api_key" >> .env
echo "ELEVEN_API_KEY=your_elevenlabs_api_key" >> .env
```
## Project Structure
```
pdf-to-podcast/
βββ main.py # Core conversion logic
βββ app.py # FastAPI application
βββ run.py # Server startup script
βββ templates/ # HTML templates
β βββ index.html # Web interface
βββ uploads/ # Temporary PDF storage
βββ outputs/ # Generated files
```
## Usage
### Web Interface
1. Start the server:
```bash
python run.py
```
2. Open your browser and navigate to `http://localhost:8000`
3. Upload a PDF file
4. Download the generated files:
- Cleaned text version
- Conversation transcript
- MP3 podcast file
### API Endpoints
- `POST /upload-pdf/`: Upload PDF and generate podcast
- `GET /download/{filename}`: Download generated files
- `GET /status`: Check API status
## API Examples
```python
import requests
# Upload PDF
with open('document.pdf', 'rb') as f:
response = requests.post(
'http://localhost:8000/upload-pdf/',
files={'file': f}
)
# Download generated podcast
response = requests.get(
'http://localhost:8000/download/document_podcast.mp3'
)
```
## Configuration
Voice IDs can be configured in `main.py`:
```python
self.alex_voice_id = "21m00Tcm4TlvDq8ikWAM" # Rachel voice
self.jamie_voice_id = "IKne3meq5aSn9XLyUdCD" # Adam voice
```
## Dependencies
- `google-generativeai`: Gemini Pro API
- `elevenlabs`: Text-to-speech synthesis
- `PyPDF2`: PDF processing
- `fastapi`: Web API framework
- `pydub`: Audio processing
- `python-multipart`: File upload handling
- `uvicorn`: ASGI server
- `jinja2`: Template engine
## Contributing
1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Create a Pull Request
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Acknowledgments
- Google Gemini for AI text processing
- ElevenLabs for voice synthesis
- FastAPI team for the excellent web framework
## Support
For support, please open an issue in the GitHub repository or contact [your-email]. |