metadata
title: Test
emoji: 🐠
colorFrom: pink
colorTo: pink
sdk: docker
pinned: false
AI-Powered Question & Answer Generator with Voice Cloning
Overview
This project leverages cutting-edge AI technologies to create an interactive experience where AI-generated answers are delivered using a cloned voice. The primary components of the project include:
- Text Generation: Based on a fine-tuned model, Mistral-7B-v0.1, we generate realistic and human-like answers to user-provided questions.
- Voice Cloning: Using the ElevenLabs API, we clone a voice and synthesize the AI-generated answers into natural-sounding speech.
- Deception for Interaction: The system is designed to "tromper" (mislead) players by making the responses appear as if they are coming from a real human.
Key Features
Fine-Tuned Model for Text Generation:
- The project utilizes the Mistral-7B-v0.1 model fine-tuned on a custom dataset.
- The model generates contextually accurate, human-like responses to a wide range of questions.
Voice Cloning with ElevenLabs:
- ElevenLabs’ Speech-to-Text and Voice Cloning API is used to replicate a target voice.
- The cloned voice delivers the AI-generated answers in a natural and believable manner.
Integration for Immersion:
- The generated answers and synthesized speech are integrated to provide seamless interaction.
- Designed for applications in gaming, interactive storytelling, or prank scenarios.
How It Works
1. Question Input:
- Users provide a question in text form (e.g., "What’s the best way to prepare for a long flight?").
- Alternatively, voice input can be transcribed to text using ElevenLabs’ speech-to-text feature.
2. Text Generation:
- The Mistral-7B-v0.1 model processes the input question and generates a natural response.
- Example:
- Question: "What’s your favorite place to relax?"
- Answer: "My room, where I can unwind and enjoy some quiet time."
3. Voice Cloning:
- The generated text is sent to ElevenLabs’ API, where it is converted into speech using a cloned voice.
- The voice sounds human, complete with natural intonation and emotion.
4. Output Delivery:
- The final output is an audio response delivered in the cloned voice, making it indistinguishable from a real human speaker.
Applications
- Gaming: Use in trivia or role-playing games to simulate human-like NPCs.
- Storytelling: Create immersive audio experiences by combining generated text with realistic voiceovers.
- Social Experiments: Test human reactions to AI-generated, voice-synthesized responses in various scenarios.
- Entertainment/Pranks: Surprise players or audiences with a system that convincingly mimics a real human.
Technologies Used
Mistral-7B-v0.1:
- A fine-tuned large language model specializing in text generation.
- Delivers contextually accurate and relatable answers.
ElevenLabs API:
- Speech-to-Text: Converts spoken questions into text for the model to process.
- Voice Cloning: Synthesizes text into speech using a cloned voice.
Python:
- Backend logic for integrating text generation, voice synthesis, and API calls.
- Frameworks and libraries include
transformers
,torch
, and API wrappers for ElevenLabs.
Setup Instructions
1. Clone the Repository:
git clone https://github.com/Lirone/NotMe.git
cd NotMe