Research-Companion / README.md
Tanmay Jain
minor text changes
289b1c0

A newer version of the Gradio SDK is available: 5.14.0

Upgrade
metadata
title: Research Companion
emoji: 🏢
colorFrom: pink
colorTo: red
sdk: gradio
sdk_version: 5.5.0
app_file: app.py
pinned: false
short_description: AI tool turning Academic papers into podcasts

AI Research Companion - Transforming Research Papers into Podcasts

Overview

The AI Research Companion is an innovative tool designed to make academic research more accessible. It transforms complex, text-heavy research papers into audio podcasts, enabling users to consume academic content in a more engaging and convenient way.

This project was initially developed during the Smart India Hackathon (SIH) in 2023 to address the overwhelming challenge of managing and understanding a large number of research papers. It leverages large language models (LLMs) to extract relevant text, generate readable transcripts, and convert these into audio podcasts.

Features

  • Text Extraction: Extracts content from uploaded PDFs to create clean, readable text.
  • Transcript Generation: Uses AI to generate a coherent transcript from the extracted text.
  • TTS (Text-to-Speech): Converts the refined transcript into an audio file.
  • Editable Transcript: Users can modify the transcript before converting it into audio, allowing for better control over the final output.
  • Audio Output: Listen to the final generated podcast from the research paper.

Development Status

The tool is still under development with plans to:

  • Integrate web search capabilities to find related research.
  • Explore additional Text-to-Speech engines to enhance the audio output.

Requirements

  • Python 3.7 or higher
  • Gradio
  • Various AI/LLM APIs (configured in the config directory)
  • Edge TTS for audio generation

Setup Instructions

  1. Clone this repository to your local machine:
    git clone <repository_url>
    
  2. Install the required dependencies:
    pip install -r requirements.txt
    
  3. Set up API keys for the LLM models in the config directory.

Usage

  1. Upload PDF: Start by uploading a research paper in PDF format.
  2. Select Model: Choose the text model for processing the document.
  3. Text Preview: Preview the extracted text before proceeding.
  4. Transcript Preview: Review the generated transcript and make edits if needed.
  5. TTS Output: After finalizing the transcript, generate the audio podcast from the text.

Note:

This tool uses APIs for LLMs, but if GPUs are available, you can easily switch the API base to local models like "ollama" for enhanced performance.

Acknowledgements

Special thanks to yasserrmd for inspiring the structured prompts that guide this project.

License

This project is open source under the MIT License. Feel free to contribute and improve the tool.