metadata

title: Test
emoji: 🐠
colorFrom: pink
colorTo: pink
sdk: docker
pinned: false

AI-Powered Question & Answer Generator with Voice Cloning

Overview

This project leverages cutting-edge AI technologies to create an interactive experience where AI-generated answers are delivered using a cloned voice. The primary components of the project include:

Text Generation: Based on a fine-tuned model, Mistral-7B-v0.1, we generate realistic and human-like answers to user-provided questions.
Voice Cloning: Using the ElevenLabs API, we clone a voice and synthesize the AI-generated answers into natural-sounding speech.
Deception for Interaction: The system is designed to "tromper" (mislead) players by making the responses appear as if they are coming from a real human.

Key Features

Fine-Tuned Model for Text Generation:
- The project utilizes the Mistral-7B-v0.1 model fine-tuned on a custom dataset.
- The model generates contextually accurate, human-like responses to a wide range of questions.
Voice Cloning with ElevenLabs:
- ElevenLabs’ Speech-to-Text and Voice Cloning API is used to replicate a target voice.
- The cloned voice delivers the AI-generated answers in a natural and believable manner.
Integration for Immersion:
- The generated answers and synthesized speech are integrated to provide seamless interaction.
- Designed for applications in gaming, interactive storytelling, or prank scenarios.

How It Works

1. Question Input:

Users provide a question in text form (e.g., "What’s the best way to prepare for a long flight?").
Alternatively, voice input can be transcribed to text using ElevenLabs’ speech-to-text feature.

2. Text Generation:

The Mistral-7B-v0.1 model processes the input question and generates a natural response.
Example:
- Question: "What’s your favorite place to relax?"
- Answer: "My room, where I can unwind and enjoy some quiet time."

3. Voice Cloning:

The generated text is sent to ElevenLabs’ API, where it is converted into speech using a cloned voice.
The voice sounds human, complete with natural intonation and emotion.

4. Output Delivery:

The final output is an audio response delivered in the cloned voice, making it indistinguishable from a real human speaker.

Applications

Gaming: Use in trivia or role-playing games to simulate human-like NPCs.
Storytelling: Create immersive audio experiences by combining generated text with realistic voiceovers.
Social Experiments: Test human reactions to AI-generated, voice-synthesized responses in various scenarios.
Entertainment/Pranks: Surprise players or audiences with a system that convincingly mimics a real human.

Technologies Used

Mistral-7B-v0.1:
- A fine-tuned large language model specializing in text generation.
- Delivers contextually accurate and relatable answers.
ElevenLabs API:
- Speech-to-Text: Converts spoken questions into text for the model to process.
- Voice Cloning: Synthesizes text into speech using a cloned voice.
Python:
- Backend logic for integrating text generation, voice synthesis, and API calls.
- Frameworks and libraries include transformers, torch, and API wrappers for ElevenLabs.

Setup Instructions

1. Clone the Repository:

git clone https://github.com/Lirone/NotMe.git
cd NotMe