NotebookMg / README.md
TheM1N9
readme updated
363f523
|
raw
history blame
3.14 kB
metadata
title: PDF to Podcast Converter
emoji: πŸŽ™οΈ
colorFrom: green
colorTo: purple
sdk: docker
app_port: 7860

NotebookMg

This project converts PDF documents into engaging podcast conversations using AI. It leverages Google's Gemini Pro for text processing and ElevenLabs for voice synthesis.

Features

  • PDF text extraction and cleaning
  • Conversion of academic/technical content into natural dialogue
  • Dynamic conversation generation between two hosts (Alex and Jamie)
  • High-quality text-to-speech synthesis
  • Web interface for easy interaction
  • API endpoints for programmatic access

Prerequisites

  • Python 3.8+
  • Google Gemini API key
  • ElevenLabs API key

Installation

  1. Clone the repository:
git clone <repository-url>
cd pdf-to-podcast
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables:
# Create a .env file
touch .env

# Add your API keys
echo "GEMINI_API_KEY=your_gemini_api_key" >> .env
echo "ELEVEN_API_KEY=your_elevenlabs_api_key" >> .env

Project Structure

pdf-to-podcast/
β”œβ”€β”€ main.py           # Core conversion logic
β”œβ”€β”€ app.py           # FastAPI application
β”œβ”€β”€ run.py           # Server startup script
β”œβ”€β”€ templates/       # HTML templates
β”‚   └── index.html   # Web interface
β”œβ”€β”€ uploads/         # Temporary PDF storage
└── outputs/         # Generated files

Usage

Web Interface

  1. Start the server:
python run.py
  1. Open your browser and navigate to http://localhost:8000
  2. Upload a PDF file
  3. Download the generated files:
    • Cleaned text version
    • Conversation transcript
    • MP3 podcast file

API Endpoints

  • POST /upload-pdf/: Upload PDF and generate podcast
  • GET /download/{filename}: Download generated files
  • GET /status: Check API status

API Examples

import requests

# Upload PDF
with open('document.pdf', 'rb') as f:
    response = requests.post(
        'http://localhost:8000/upload-pdf/',
        files={'file': f}
    )
    
# Download generated podcast
response = requests.get(
    'http://localhost:8000/download/document_podcast.mp3'
)

Configuration

Voice IDs can be configured in main.py:

self.alex_voice_id = "21m00Tcm4TlvDq8ikWAM"  # Rachel voice
self.jamie_voice_id = "IKne3meq5aSn9XLyUdCD"  # Adam voice

Dependencies

  • google-generativeai: Gemini Pro API
  • elevenlabs: Text-to-speech synthesis
  • PyPDF2: PDF processing
  • fastapi: Web API framework
  • pydub: Audio processing
  • python-multipart: File upload handling
  • uvicorn: ASGI server
  • jinja2: Template engine

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • Google Gemini for AI text processing
  • ElevenLabs for voice synthesis
  • FastAPI team for the excellent web framework

Support

For support, please open an issue in the GitHub repository or contact [your-email].