Spaces:

Tanmay09516
/

Research-Companion

Sleeping

App Files Files Community

Research-Companion / README.md

Tanmay Jain

minor text changes

289b1c0 3 months ago

preview code

raw

history blame contribute delete

2.83 kB

	---
	title: Research Companion
	emoji: 🏢
	colorFrom: pink
	colorTo: red
	sdk: gradio
	sdk_version: 5.5.0
	app_file: app.py
	pinned: false
	short_description: AI tool turning Academic papers into podcasts
	---

	# AI Research Companion - Transforming Research Papers into Podcasts

	## Overview
	The AI Research Companion is an innovative tool designed to make academic research more accessible. It transforms complex, text-heavy research papers into audio podcasts, enabling users to consume academic content in a more engaging and convenient way.

	This project was initially developed during the Smart India Hackathon (SIH) in 2023 to address the overwhelming challenge of managing and understanding a large number of research papers. It leverages large language models (LLMs) to extract relevant text, generate readable transcripts, and convert these into audio podcasts.

	## Features
	- Text Extraction: Extracts content from uploaded PDFs to create clean, readable text.
	- Transcript Generation: Uses AI to generate a coherent transcript from the extracted text.
	- TTS (Text-to-Speech): Converts the refined transcript into an audio file.
	- Editable Transcript: Users can modify the transcript before converting it into audio, allowing for better control over the final output.
	- Audio Output: Listen to the final generated podcast from the research paper.

	## Development Status
	The tool is still under development with plans to:
	- Integrate web search capabilities to find related research.
	- Explore additional Text-to-Speech engines to enhance the audio output.

	## Requirements
	- Python 3.7 or higher
	- Gradio
	- Various AI/LLM APIs (configured in the `config` directory)
	- Edge TTS for audio generation

	## Setup Instructions
	1. Clone this repository to your local machine:
	```bash
	git clone <repository_url>
	```
	2. Install the required dependencies:
	```bash
	pip install -r requirements.txt
	```
	3. Set up API keys for the LLM models in the `config` directory.

	## Usage
	1. Upload PDF: Start by uploading a research paper in PDF format.
	2. Select Model: Choose the text model for processing the document.
	3. Text Preview: Preview the extracted text before proceeding.
	4. Transcript Preview: Review the generated transcript and make edits if needed.
	5. TTS Output: After finalizing the transcript, generate the audio podcast from the text.

	## Note:
	This tool uses APIs for LLMs, but if GPUs are available, you can easily switch the API base to local models like "ollama" for enhanced performance.

	## Acknowledgements
	Special thanks to [yasserrmd](https://huggingface.co./spaces/yasserrmd/NotebookLlama) for inspiring the structured prompts that guide this project.

	## License
	This project is open source under the MIT License. Feel free to contribute and improve the tool.