TalTechNLP
/

whisper-large-v3-turbo-et-subs

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

whisper-large-v3-turbo-et-subs / README.md

TanelAlumae's picture

Update README.md

7a2a4b2 verified 21 days ago

|

history blame contribute delete

2.36 kB

	---
	license: mit
	language: et
	tags:
	- audio
	- automatic-speech-recognition
	#widget:
	#- example_title: Librispeech sample 1
	# src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
	#- example_title: Librispeech sample 2
	# src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
	pipeline_tag: automatic-speech-recognition
	base_model:
	- openai/whisper-large-v3-turbo
	library_name: transformers
	---

	## Introduction

	This model is OpenAI Whisper large-v3-turbo, finetuned on ~770 hours of manually created subtitles from Estonian TV (ETV).
	Therefore, this model does not always create verbatim (word-by-word) subtitles but often rephrases the sentences and
	compresses text, especially in the case of spontaneous speech, hestitations, repetitions, etc. However, the length
	of the generated text chunks almost always conforms to the ETV subtitle requirements (48 characters per line).

	## Usage



	It's a finetuned vesion of Whisper large-v3-turbo and can be therefore used via Hugging Face 🤗 Transformers. To run the model, first install the Transformers
	library. For this example, we'll also install 🤗 Accelerate to reduce the model loading time:

	```bash
	pip install --upgrade pip
	pip install --upgrade transformers accelerate
	```

	The model can be used with the [`pipeline`](https://huggingface.co./docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
	class to transcribe audios of arbitrary length:

	```python
	import torch
	from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
	from datasets import load_dataset


	device = "cuda:0" if torch.cuda.is_available() else "cpu"
	torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

	model_id = "TalTechNLP/whisper-large-v3-turbo-et-subs"

	model = AutoModelForSpeechSeq2Seq.from_pretrained(
	model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
	)
	model.to(device)

	processor = AutoProcessor.from_pretrained(model_id)

	pipe = pipeline(
	"automatic-speech-recognition",
	model=model,
	tokenizer=processor.tokenizer,
	feature_extractor=processor.feature_extractor,
	torch_dtype=torch_dtype,
	device=device,
	)

	audio = "sample.mp3"

	result = pipe(sample, generate_kwargs={"task": "transcribe", "language": "et"})
	print(result)
	```

	## Evaluation results

	TODO