test / README.md
dcrey7's picture
Upload 522 files
811126d verified
|
raw
history blame
3.58 kB
metadata
title: Test
emoji: 🐠
colorFrom: pink
colorTo: pink
sdk: docker
pinned: false

AI-Powered Question & Answer Generator with Voice Cloning


Overview

This project leverages cutting-edge AI technologies to create an interactive experience where AI-generated answers are delivered using a cloned voice. The primary components of the project include:

  1. Text Generation: Based on a fine-tuned model, Mistral-7B-v0.1, we generate realistic and human-like answers to user-provided questions.
  2. Voice Cloning: Using the ElevenLabs API, we clone a voice and synthesize the AI-generated answers into natural-sounding speech.
  3. Deception for Interaction: The system is designed to "tromper" (mislead) players by making the responses appear as if they are coming from a real human.

Key Features

  1. Fine-Tuned Model for Text Generation:

    • The project utilizes the Mistral-7B-v0.1 model fine-tuned on a custom dataset.
    • The model generates contextually accurate, human-like responses to a wide range of questions.
  2. Voice Cloning with ElevenLabs:

    • ElevenLabs’ Speech-to-Text and Voice Cloning API is used to replicate a target voice.
    • The cloned voice delivers the AI-generated answers in a natural and believable manner.
  3. Integration for Immersion:

    • The generated answers and synthesized speech are integrated to provide seamless interaction.
    • Designed for applications in gaming, interactive storytelling, or prank scenarios.

How It Works

1. Question Input:

  • Users provide a question in text form (e.g., "What’s the best way to prepare for a long flight?").
  • Alternatively, voice input can be transcribed to text using ElevenLabs’ speech-to-text feature.

2. Text Generation:

  • The Mistral-7B-v0.1 model processes the input question and generates a natural response.
  • Example:
    • Question: "What’s your favorite place to relax?"
    • Answer: "My room, where I can unwind and enjoy some quiet time."

3. Voice Cloning:

  • The generated text is sent to ElevenLabs’ API, where it is converted into speech using a cloned voice.
  • The voice sounds human, complete with natural intonation and emotion.

4. Output Delivery:

  • The final output is an audio response delivered in the cloned voice, making it indistinguishable from a real human speaker.

Applications

  • Gaming: Use in trivia or role-playing games to simulate human-like NPCs.
  • Storytelling: Create immersive audio experiences by combining generated text with realistic voiceovers.
  • Social Experiments: Test human reactions to AI-generated, voice-synthesized responses in various scenarios.
  • Entertainment/Pranks: Surprise players or audiences with a system that convincingly mimics a real human.

Technologies Used

  1. Mistral-7B-v0.1:

    • A fine-tuned large language model specializing in text generation.
    • Delivers contextually accurate and relatable answers.
  2. ElevenLabs API:

    • Speech-to-Text: Converts spoken questions into text for the model to process.
    • Voice Cloning: Synthesizes text into speech using a cloned voice.
  3. Python:

    • Backend logic for integrating text generation, voice synthesis, and API calls.
    • Frameworks and libraries include transformers, torch, and API wrappers for ElevenLabs.

Setup Instructions

1. Clone the Repository:

git clone https://github.com/Lirone/NotMe.git
cd NotMe