Neurai
/

NeuraSpeech_WhisperBase

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

NeuraSpeech_WhisperBase / README.md

Neura's picture

Update README.md

65a2cf9 verified 5 months ago

|

2.52 kB

	---
	library_name: transformers
	tags:
	- persian
	- whisper-base
	- whisper
	- farsi
	- Neura
	- NeuraSpeech
	license: apache-2.0
	language:
	- fa
	pipeline_tag: automatic-speech-recognition
	---


	#

	<p align="center">
	<img src="neura_speech_2.png" width=512 height=256 />
	</p>


	<!-- Provide a quick summary of what the model is/does. -->

	## Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Developed by: Neura company
	- Funded by: Neura
	- Model type: Whisper Base
	- Language(s) (NLP): Persian

	## Model Architecture

	This model uses a FastConformer-TDT architecture. FastConformer [1] is an optimized version of the Conformer model with 8x depthwise-separable convolutional downsampling.
	You may find more information on the details of FastConformer here: Fast-Conformer Model.
	[Fast Conformer with Linearly Scalable Attention for Efficient
	Speech Recognition](https://arxiv.org/abs/2305.05084).

	## Uses
	Check out the Google Colab demo to run NeuraSpeech ASR on a free-tier Google Colab instance: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/12d7zecB94ah7ZHKnDtJF58saLzdkZAj3#scrollTo=oNt032WVkQUa)



	make sure these packages are installed:

	```python
	from IPython.display import Audio, display
	display(Audio('persian_audio.mp3', rate = 32_000,autoplay=True))
	```

	```python
	from transformers import WhisperProcessor, WhisperForConditionalGeneration
	import librosa

	# load model and processor
	processor = WhisperProcessor.from_pretrained("Neurai/NeuraSpeech_WhisperBase")
	model = WhisperForConditionalGeneration.from_pretrained("Neurai/NeuraSpeech_WhisperBase")
	forced_decoder_ids = processor.get_decoder_prompt_ids(language="fa", task="transcribe")

	array, sample_rate = librosa.load('persian_audio.mp3', sr=16000,mono=True)
	sr = 16000
	array = librosa.to_mono(array)
	array = librosa.resample(array, orig_sr=sample_rate, target_sr=16000)
	input_features = processor(array, sampling_rate=sr, return_tensors="pt").input_features

	# generate token ids
	predicted_ids = model.generate(input_features)
	# decode token ids to text
	transcription = processor.batch_decode(predicted_ids,)
	transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
	print(transcription)

	```
	trascribed text :
	```
	او خواهان آزاد کردن بردگان بود
	```


	## More Information
	https://neura.info

	## Model Card Authors
	Esmaeil Zahedi, Mohsen Yazdinejad

	## Model Card Contact
	[email protected]