File size: 3,578 Bytes
af8e868
 
 
 
 
 
 
 
 
811126d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
title: Test
emoji: 🐠
colorFrom: pink
colorTo: pink
sdk: docker
pinned: false
---


# **AI-Powered Question & Answer Generator with Voice Cloning**

---

## **Overview**

This project leverages cutting-edge AI technologies to create an interactive experience where AI-generated answers are delivered using a cloned voice. The primary components of the project include:

1. **Text Generation**: Based on a fine-tuned model, Mistral-7B-v0.1, we generate realistic and human-like answers to user-provided questions.
2. **Voice Cloning**: Using the ElevenLabs API, we clone a voice and synthesize the AI-generated answers into natural-sounding speech.
3. **Deception for Interaction**: The system is designed to "tromper" (mislead) players by making the responses appear as if they are coming from a real human.

---

## **Key Features**

1. **Fine-Tuned Model for Text Generation**:
   - The project utilizes the **Mistral-7B-v0.1** model fine-tuned on a custom dataset.
   - The model generates contextually accurate, human-like responses to a wide range of questions.

2. **Voice Cloning with ElevenLabs**:
   - ElevenLabs’ **Speech-to-Text and Voice Cloning API** is used to replicate a target voice.
   - The cloned voice delivers the AI-generated answers in a natural and believable manner.

3. **Integration for Immersion**:
   - The generated answers and synthesized speech are integrated to provide seamless interaction.
   - Designed for applications in gaming, interactive storytelling, or prank scenarios.

---

## **How It Works**

### 1. **Question Input**:
   - Users provide a question in text form (e.g., "What’s the best way to prepare for a long flight?").
   - Alternatively, voice input can be transcribed to text using ElevenLabs’ speech-to-text feature.

### 2. **Text Generation**:
   - The Mistral-7B-v0.1 model processes the input question and generates a natural response.
   - Example:
     - **Question**: "What’s your favorite place to relax?"
     - **Answer**: "My room, where I can unwind and enjoy some quiet time."

### 3. **Voice Cloning**:
   - The generated text is sent to ElevenLabs’ API, where it is converted into speech using a cloned voice.
   - The voice sounds human, complete with natural intonation and emotion.

### 4. **Output Delivery**:
   - The final output is an audio response delivered in the cloned voice, making it indistinguishable from a real human speaker.

---

## **Applications**

- **Gaming**: Use in trivia or role-playing games to simulate human-like NPCs.
- **Storytelling**: Create immersive audio experiences by combining generated text with realistic voiceovers.
- **Social Experiments**: Test human reactions to AI-generated, voice-synthesized responses in various scenarios.
- **Entertainment/Pranks**: Surprise players or audiences with a system that convincingly mimics a real human.

---

## **Technologies Used**

1. **Mistral-7B-v0.1**:
   - A fine-tuned large language model specializing in text generation.
   - Delivers contextually accurate and relatable answers.

2. **ElevenLabs API**:
   - **Speech-to-Text**: Converts spoken questions into text for the model to process.
   - **Voice Cloning**: Synthesizes text into speech using a cloned voice.

3. **Python**:
   - Backend logic for integrating text generation, voice synthesis, and API calls.
   - Frameworks and libraries include `transformers`, `torch`, and API wrappers for ElevenLabs.

---

## **Setup Instructions**

### 1. **Clone the Repository**:
   ```bash
   git clone https://github.com/Lirone/NotMe.git
   cd NotMe