--- title: Research Companion emoji: 🏢 colorFrom: pink colorTo: red sdk: gradio sdk_version: 5.5.0 app_file: app.py pinned: false short_description: AI tool turning Academic papers into podcasts --- # AI Research Companion - Transforming Research Papers into Podcasts ## Overview The AI Research Companion is an innovative tool designed to make academic research more accessible. It transforms complex, text-heavy research papers into audio podcasts, enabling users to consume academic content in a more engaging and convenient way. This project was initially developed during the Smart India Hackathon (SIH) in 2023 to address the overwhelming challenge of managing and understanding a large number of research papers. It leverages large language models (LLMs) to extract relevant text, generate readable transcripts, and convert these into audio podcasts. ## Features - **Text Extraction:** Extracts content from uploaded PDFs to create clean, readable text. - **Transcript Generation:** Uses AI to generate a coherent transcript from the extracted text. - **TTS (Text-to-Speech):** Converts the refined transcript into an audio file. - **Editable Transcript:** Users can modify the transcript before converting it into audio, allowing for better control over the final output. - **Audio Output:** Listen to the final generated podcast from the research paper. ## Development Status The tool is still under development with plans to: - Integrate web search capabilities to find related research. - Explore additional Text-to-Speech engines to enhance the audio output. ## Requirements - Python 3.7 or higher - Gradio - Various AI/LLM APIs (configured in the `config` directory) - Edge TTS for audio generation ## Setup Instructions 1. Clone this repository to your local machine: ```bash git clone ``` 2. Install the required dependencies: ```bash pip install -r requirements.txt ``` 3. Set up API keys for the LLM models in the `config` directory. ## Usage 1. **Upload PDF:** Start by uploading a research paper in PDF format. 2. **Select Model:** Choose the text model for processing the document. 3. **Text Preview:** Preview the extracted text before proceeding. 4. **Transcript Preview:** Review the generated transcript and make edits if needed. 5. **TTS Output:** After finalizing the transcript, generate the audio podcast from the text. ## Note: This tool uses APIs for LLMs, but if GPUs are available, you can easily switch the API base to local models like "ollama" for enhanced performance. ## Acknowledgements Special thanks to [yasserrmd](https://huggingface.co./spaces/yasserrmd/NotebookLlama) for inspiring the structured prompts that guide this project. ## License This project is open source under the MIT License. Feel free to contribute and improve the tool.