File size: 2,310 Bytes

3c4c02b
 
26bea3f
 
dd9a6f0
26bea3f
 
 
 
 
 
 
 
3c4c02b
 
 
26bea3f
3c4c02b
26bea3f
 
 
3c4c02b
 
790c92d
3c4c02b
790c92d
3c4c02b
 
 
790c92d
 
 
 
3c4c02b
790c92d
3c4c02b
5072137
cf36d6a
3c4c02b
 
790c92d
3c4c02b
 
 
790c92d
3c4c02b
790c92d
 
 
 
3c4c02b
790c92d
 
 
3c4c02b
790c92d
 
 
 
3c4c02b
d0f95e6
790c92d
 
 
 
3c4c02b
65a2cf9
 
 
 
 
 
 
790c92d
 
 
 
 
3c4c02b
 
790c92d
 
3c4c02b
790c92d
 
3c4c02b
 
790c92d

---
library_name: transformers
tags:
- persian
- whisper-base
- whisper
- farsi
- Neura
- NeuraSpeech
license: apache-2.0
language:
- fa
pipeline_tag: automatic-speech-recognition
---


# 

<p align="center">
  <img src="neura_speech_2.png" width=512 height=256 />
</p>


<!-- Provide a quick summary of what the model is/does. -->

## Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** Neura company
- **Funded by:** Neura
- **Model type:** Whisper Base
- **Language(s) (NLP):** Persian

## Model Architecture

Whisper is a Transformer based encoder-decoder model, also referred to as a sequence-to-sequence model.
It is a pre-trained model for automatic speech recognition (ASR) and speech translation.

## Uses
Check out the Google Colab demo to run NeuraSpeech ASR on a free-tier Google Colab instance: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/12d7zecB94ah7ZHKnDtJF58saLzdkZAj3#scrollTo=oNt032WVkQUa)



make sure these packages are installed:

```python
from IPython.display import Audio, display
display(Audio('persian_audio.mp3', rate = 32_000,autoplay=True))
```

```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa

# load model and processor
processor = WhisperProcessor.from_pretrained("Neurai/NeuraSpeech_WhisperBase")
model = WhisperForConditionalGeneration.from_pretrained("Neurai/NeuraSpeech_WhisperBase")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="fa", task="transcribe")

array, sample_rate = librosa.load('persian_audio.mp3')
sr = 16000
array = librosa.to_mono(array)
array = librosa.resample(array, orig_sr=sample_rate, target_sr=16000)
input_features = processor(array, sampling_rate=sr, return_tensors="pt").input_features

# generate token ids
predicted_ids = model.generate(input_features)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids,)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription)

```
trascribed text :
```
او خواهان آزاد کردن بردگان بود
```


## More Information
https://neura.info

## Model Card Authors
Esmaeil Zahedi, Mohsen Yazdinejad

## Model Card Contact
[email protected]