File size: 2,834 Bytes
94bf4c8
 
 
 
 
 
 
 
 
 
 
 
289b1c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
---
title: Research Companion
emoji: 🏢
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 5.5.0
app_file: app.py
pinned: false
short_description: AI tool turning Academic papers into podcasts
---

# AI Research Companion - Transforming Research Papers into Podcasts

## Overview
The AI Research Companion is an innovative tool designed to make academic research more accessible. It transforms complex, text-heavy research papers into audio podcasts, enabling users to consume academic content in a more engaging and convenient way.

This project was initially developed during the Smart India Hackathon (SIH) in 2023 to address the overwhelming challenge of managing and understanding a large number of research papers. It leverages large language models (LLMs) to extract relevant text, generate readable transcripts, and convert these into audio podcasts.

## Features
- **Text Extraction:** Extracts content from uploaded PDFs to create clean, readable text.
- **Transcript Generation:** Uses AI to generate a coherent transcript from the extracted text.
- **TTS (Text-to-Speech):** Converts the refined transcript into an audio file.
- **Editable Transcript:** Users can modify the transcript before converting it into audio, allowing for better control over the final output.
- **Audio Output:** Listen to the final generated podcast from the research paper.

## Development Status
The tool is still under development with plans to:
- Integrate web search capabilities to find related research.
- Explore additional Text-to-Speech engines to enhance the audio output.

## Requirements
- Python 3.7 or higher
- Gradio
- Various AI/LLM APIs (configured in the `config` directory)
- Edge TTS for audio generation

## Setup Instructions
1. Clone this repository to your local machine:
    ```bash
    git clone <repository_url>
    ```
2. Install the required dependencies:
    ```bash
    pip install -r requirements.txt
    ```
3. Set up API keys for the LLM models in the `config` directory.

## Usage
1. **Upload PDF:** Start by uploading a research paper in PDF format.
2. **Select Model:** Choose the text model for processing the document.
3. **Text Preview:** Preview the extracted text before proceeding.
4. **Transcript Preview:** Review the generated transcript and make edits if needed.
5. **TTS Output:** After finalizing the transcript, generate the audio podcast from the text.

## Note:
This tool uses APIs for LLMs, but if GPUs are available, you can easily switch the API base to local models like "ollama" for enhanced performance.

## Acknowledgements
Special thanks to [yasserrmd](https://huggingface.co./spaces/yasserrmd/NotebookLlama) for inspiring the structured prompts that guide this project.

## License
This project is open source under the MIT License. Feel free to contribute and improve the tool.