Spaces:
Sleeping
Sleeping
metadata
title: PDF to Podcast Converter
emoji: ποΈ
colorFrom: green
colorTo: purple
sdk: docker
app_port: 7860
NotebookMg
This project converts PDF documents into engaging podcast conversations using AI. It leverages Google's Gemini Pro for text processing and ElevenLabs for voice synthesis.
Features
- PDF text extraction and cleaning
- Conversion of academic/technical content into natural dialogue
- Dynamic conversation generation between two hosts (Alex and Jamie)
- High-quality text-to-speech synthesis
- Web interface for easy interaction
- API endpoints for programmatic access
Prerequisites
- Python 3.8+
- Google Gemini API key
- ElevenLabs API key
Installation
- Clone the repository:
git clone <repository-url>
cd pdf-to-podcast
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
# Create a .env file
touch .env
# Add your API keys
echo "GEMINI_API_KEY=your_gemini_api_key" >> .env
echo "ELEVEN_API_KEY=your_elevenlabs_api_key" >> .env
Project Structure
pdf-to-podcast/
βββ main.py # Core conversion logic
βββ app.py # FastAPI application
βββ run.py # Server startup script
βββ templates/ # HTML templates
β βββ index.html # Web interface
βββ uploads/ # Temporary PDF storage
βββ outputs/ # Generated files
Usage
Web Interface
- Start the server:
python run.py
- Open your browser and navigate to
http://localhost:8000
- Upload a PDF file
- Download the generated files:
- Cleaned text version
- Conversation transcript
- MP3 podcast file
API Endpoints
POST /upload-pdf/
: Upload PDF and generate podcastGET /download/{filename}
: Download generated filesGET /status
: Check API status
API Examples
import requests
# Upload PDF
with open('document.pdf', 'rb') as f:
response = requests.post(
'http://localhost:8000/upload-pdf/',
files={'file': f}
)
# Download generated podcast
response = requests.get(
'http://localhost:8000/download/document_podcast.mp3'
)
Configuration
Voice IDs can be configured in main.py
:
self.alex_voice_id = "21m00Tcm4TlvDq8ikWAM" # Rachel voice
self.jamie_voice_id = "IKne3meq5aSn9XLyUdCD" # Adam voice
Dependencies
google-generativeai
: Gemini Pro APIelevenlabs
: Text-to-speech synthesisPyPDF2
: PDF processingfastapi
: Web API frameworkpydub
: Audio processingpython-multipart
: File upload handlinguvicorn
: ASGI serverjinja2
: Template engine
Contributing
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Google Gemini for AI text processing
- ElevenLabs for voice synthesis
- FastAPI team for the excellent web framework
Support
For support, please open an issue in the GitHub repository or contact [your-email].