File size: 846 Bytes
acba04d
 
05f8a10
 
 
 
 
acba04d
 
c7ba5ce
acba04d
eb9e807
acba04d
eb9e807
acba04d
05f8a10
acba04d
05f8a10
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
---
library_name: transformers
license: apache-2.0
language:
- en
base_model:
- TinyLlama/TinyLlama_v1.1
---

# Vikhr Salt: Speech And Language Transformer

![Vikhr Salt Logo](https://huggingface.co./Vikhrmodels/salt-116k/resolve/main/IMG_1304%20copy.png)

Vikhr Salt is a multimodal model based on a pre-trained large language model, extended with new audio tokens to handle both TTS (text-to-speech) and ASR (automatic speech recognition) tasks. The model incorporates two variants for encoding audio—Encodec and SpeechTokenizer—and achieves stable training by fine-tuning precision settings. This approach allows Vikhr Salt to leverage pre-existing LLM knowledge while effectively generating and understanding speech, marking a step forward in multimodal learning.

## Model  Authors 

Ksenya Sycheva, Konstantin Korolev, Aleksandr Nikolic