File size: 3,578 Bytes
af8e868 811126d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
---
title: Test
emoji: 🐠
colorFrom: pink
colorTo: pink
sdk: docker
pinned: false
---
# **AI-Powered Question & Answer Generator with Voice Cloning**
---
## **Overview**
This project leverages cutting-edge AI technologies to create an interactive experience where AI-generated answers are delivered using a cloned voice. The primary components of the project include:
1. **Text Generation**: Based on a fine-tuned model, Mistral-7B-v0.1, we generate realistic and human-like answers to user-provided questions.
2. **Voice Cloning**: Using the ElevenLabs API, we clone a voice and synthesize the AI-generated answers into natural-sounding speech.
3. **Deception for Interaction**: The system is designed to "tromper" (mislead) players by making the responses appear as if they are coming from a real human.
---
## **Key Features**
1. **Fine-Tuned Model for Text Generation**:
- The project utilizes the **Mistral-7B-v0.1** model fine-tuned on a custom dataset.
- The model generates contextually accurate, human-like responses to a wide range of questions.
2. **Voice Cloning with ElevenLabs**:
- ElevenLabs’ **Speech-to-Text and Voice Cloning API** is used to replicate a target voice.
- The cloned voice delivers the AI-generated answers in a natural and believable manner.
3. **Integration for Immersion**:
- The generated answers and synthesized speech are integrated to provide seamless interaction.
- Designed for applications in gaming, interactive storytelling, or prank scenarios.
---
## **How It Works**
### 1. **Question Input**:
- Users provide a question in text form (e.g., "What’s the best way to prepare for a long flight?").
- Alternatively, voice input can be transcribed to text using ElevenLabs’ speech-to-text feature.
### 2. **Text Generation**:
- The Mistral-7B-v0.1 model processes the input question and generates a natural response.
- Example:
- **Question**: "What’s your favorite place to relax?"
- **Answer**: "My room, where I can unwind and enjoy some quiet time."
### 3. **Voice Cloning**:
- The generated text is sent to ElevenLabs’ API, where it is converted into speech using a cloned voice.
- The voice sounds human, complete with natural intonation and emotion.
### 4. **Output Delivery**:
- The final output is an audio response delivered in the cloned voice, making it indistinguishable from a real human speaker.
---
## **Applications**
- **Gaming**: Use in trivia or role-playing games to simulate human-like NPCs.
- **Storytelling**: Create immersive audio experiences by combining generated text with realistic voiceovers.
- **Social Experiments**: Test human reactions to AI-generated, voice-synthesized responses in various scenarios.
- **Entertainment/Pranks**: Surprise players or audiences with a system that convincingly mimics a real human.
---
## **Technologies Used**
1. **Mistral-7B-v0.1**:
- A fine-tuned large language model specializing in text generation.
- Delivers contextually accurate and relatable answers.
2. **ElevenLabs API**:
- **Speech-to-Text**: Converts spoken questions into text for the model to process.
- **Voice Cloning**: Synthesizes text into speech using a cloned voice.
3. **Python**:
- Backend logic for integrating text generation, voice synthesis, and API calls.
- Frameworks and libraries include `transformers`, `torch`, and API wrappers for ElevenLabs.
---
## **Setup Instructions**
### 1. **Clone the Repository**:
```bash
git clone https://github.com/Lirone/NotMe.git
cd NotMe
|