File size: 3,142 Bytes
363f523
 
 
 
 
 
 
 
 
e56e019
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
title: PDF to Podcast Converter
emoji: πŸŽ™οΈ
colorFrom: green
colorTo: purple
sdk: docker
app_port: 7860
---

# NotebookMg

This project converts PDF documents into engaging podcast conversations using AI. It leverages Google's Gemini Pro for text processing and ElevenLabs for voice synthesis.

## Features

- PDF text extraction and cleaning
- Conversion of academic/technical content into natural dialogue
- Dynamic conversation generation between two hosts (Alex and Jamie)
- High-quality text-to-speech synthesis
- Web interface for easy interaction
- API endpoints for programmatic access

## Prerequisites

- Python 3.8+
- Google Gemini API key
- ElevenLabs API key

## Installation

1. Clone the repository:
```bash
git clone <repository-url>
cd pdf-to-podcast
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Set up environment variables:
```bash
# Create a .env file
touch .env

# Add your API keys
echo "GEMINI_API_KEY=your_gemini_api_key" >> .env
echo "ELEVEN_API_KEY=your_elevenlabs_api_key" >> .env
```

## Project Structure

```
pdf-to-podcast/
β”œβ”€β”€ main.py           # Core conversion logic
β”œβ”€β”€ app.py           # FastAPI application
β”œβ”€β”€ run.py           # Server startup script
β”œβ”€β”€ templates/       # HTML templates
β”‚   └── index.html   # Web interface
β”œβ”€β”€ uploads/         # Temporary PDF storage
└── outputs/         # Generated files
```

## Usage

### Web Interface

1. Start the server:
```bash
python run.py
```

2. Open your browser and navigate to `http://localhost:8000`
3. Upload a PDF file
4. Download the generated files:
   - Cleaned text version
   - Conversation transcript
   - MP3 podcast file

### API Endpoints

- `POST /upload-pdf/`: Upload PDF and generate podcast
- `GET /download/{filename}`: Download generated files
- `GET /status`: Check API status

## API Examples

```python
import requests

# Upload PDF
with open('document.pdf', 'rb') as f:
    response = requests.post(
        'http://localhost:8000/upload-pdf/',
        files={'file': f}
    )
    
# Download generated podcast
response = requests.get(
    'http://localhost:8000/download/document_podcast.mp3'
)
```

## Configuration

Voice IDs can be configured in `main.py`:
```python
self.alex_voice_id = "21m00Tcm4TlvDq8ikWAM"  # Rachel voice
self.jamie_voice_id = "IKne3meq5aSn9XLyUdCD"  # Adam voice
```

## Dependencies

- `google-generativeai`: Gemini Pro API
- `elevenlabs`: Text-to-speech synthesis
- `PyPDF2`: PDF processing
- `fastapi`: Web API framework
- `pydub`: Audio processing
- `python-multipart`: File upload handling
- `uvicorn`: ASGI server
- `jinja2`: Template engine

## Contributing

1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Create a Pull Request

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgments

- Google Gemini for AI text processing
- ElevenLabs for voice synthesis
- FastAPI team for the excellent web framework

## Support

For support, please open an issue in the GitHub repository or contact [your-email].