open-notebooklm / README.md
knowsuchagency's picture
chore: Update virtual environment setup in README.md
592cbe6
|
raw
history blame
3.68 kB

PDF to Podcast Converter

Overview

This project provides a tool to convert any PDF document into a podcast episode! Using OpenAI's text-to-speech models and Google Gemini, this tool processes the content of a PDF, generates a natural dialogue suitable for an audio podcast, and outputs it as an MP3 file.

Features

  • Convert PDF to Podcast: Upload a PDF and convert its content into a podcast dialogue.
  • Engaging Dialogue: The generated dialogue is designed to be informative and entertaining.
  • Multiple Voice Options: Choose from different voices to narrate the podcast.
  • User-friendly Interface: Simple interface using Gradio for easy interaction.

Installation

To set up the project, follow these steps:

  1. Clone the repository:

    git clone https://github.com/knowsuchagency/pdf-to-podcast.git
    cd pdf-to-podcast
    
  2. Create a virtual environment and activate it:

    python -m venv .venv
    source .venv/bin/activate
    
  3. Install the required packages:

    pip install -r requirements.txt
    

Usage

  1. Set up API Key(s): Ensure you have an Google Gemini API key. You can get yours at https://aistudio.google.com/app/apikey. Use it as the value to GEMINI_API_KEY. You'll also need an api key for OpenAI which you can either pass through the interface or set as the OPENAI_API_KEY environment variable.

    Gemini flash is used as the LLM and OpenAI is used for text-to-speech.

  2. Run the application:

    python main.py
    

    This will launch a Gradio interface in your web browser.

  3. Upload a PDF: Upload the PDF document you want to convert into a podcast.

  4. Enter OpenAI API Key: Provide your OpenAI API key in the designated textbox.

  5. Generate Audio: Click the button to start the conversion process. The output will be an MP3 file containing the podcast dialogue.

Project Structure

  • main.py: Main application script.
  • requirements.txt: List of dependencies.
  • README.md: Project documentation (this file).

Code Explanation

Dialogue Models

Defines the structure of the dialogue using Pydantic models.

class DialogueItem(BaseModel):
    text: str
    voice: Literal["alloy", "onyx", "fable"]

class Dialogue(BaseModel):
    scratchpad: str
    dialogue: List[DialogueItem]

LLM Function

Generates dialogue based on the input text using the promptic decorator.

@llm(model="gemini/gemini-1.5-flash")
def generate_dialogue(text: str) -> Dialogue:
    # Function to generate podcast dialogue

TTS Function

Converts text to speech using OpenAI's text-to-speech model.

def get_mp3(text: str, voice: str, api_key: str = None) -> bytes:
    # Function to generate MP3 from text

Main Function

Processes the PDF, generates dialogue, and converts it to audio.

def generate_audio(file: bytes, openai_api_key: str) -> bytes:
    # Main function to process PDF and generate audio

Gradio Interface

Creates a user-friendly interface for uploading PDFs and generating podcasts.

demo = gr.Interface(
    title="PDF to Podcast",
    description="Convert any PDF document into an engaging podcast episode!",
    fn=generate_audio,
    inputs=[
        gr.File(label="Input PDF", type="binary"),
        gr.Textbox(label="OpenAI API Key", placeholder="Enter your OpenAI API key here"),
    ],
    outputs=[
        gr.Audio(format="mp3"),
    ],
)

demo.launch(show_api=False)

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for more information.