TanelAlumae commited on
Commit
7a2a4b2
·
verified ·
1 Parent(s): 3f2ee15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +75 -3
README.md CHANGED
@@ -1,3 +1,75 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language: et
4
+ tags:
5
+ - audio
6
+ - automatic-speech-recognition
7
+ #widget:
8
+ #- example_title: Librispeech sample 1
9
+ # src: https://cdn-media.huggingface.co/speech_samples/sample1.flac
10
+ #- example_title: Librispeech sample 2
11
+ # src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
12
+ pipeline_tag: automatic-speech-recognition
13
+ base_model:
14
+ - openai/whisper-large-v3-turbo
15
+ library_name: transformers
16
+ ---
17
+
18
+ ## Introduction
19
+
20
+ This model is OpenAI Whisper large-v3-turbo, finetuned on ~770 hours of manually created subtitles from Estonian TV (ETV).
21
+ Therefore, this model does not always create verbatim (word-by-word) subtitles but often rephrases the sentences and
22
+ compresses text, especially in the case of spontaneous speech, hestitations, repetitions, etc. However, the length
23
+ of the generated text chunks almost always conforms to the ETV subtitle requirements (48 characters per line).
24
+
25
+ ## Usage
26
+
27
+
28
+
29
+ It's a finetuned vesion of Whisper large-v3-turbo and can be therefore used via Hugging Face 🤗 Transformers. To run the model, first install the Transformers
30
+ library. For this example, we'll also install 🤗 Accelerate to reduce the model loading time:
31
+
32
+ ```bash
33
+ pip install --upgrade pip
34
+ pip install --upgrade transformers accelerate
35
+ ```
36
+
37
+ The model can be used with the [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
38
+ class to transcribe audios of arbitrary length:
39
+
40
+ ```python
41
+ import torch
42
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
43
+ from datasets import load_dataset
44
+
45
+
46
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
47
+ torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
48
+
49
+ model_id = "TalTechNLP/whisper-large-v3-turbo-et-subs"
50
+
51
+ model = AutoModelForSpeechSeq2Seq.from_pretrained(
52
+ model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
53
+ )
54
+ model.to(device)
55
+
56
+ processor = AutoProcessor.from_pretrained(model_id)
57
+
58
+ pipe = pipeline(
59
+ "automatic-speech-recognition",
60
+ model=model,
61
+ tokenizer=processor.tokenizer,
62
+ feature_extractor=processor.feature_extractor,
63
+ torch_dtype=torch_dtype,
64
+ device=device,
65
+ )
66
+
67
+ audio = "sample.mp3"
68
+
69
+ result = pipe(sample, generate_kwargs={"task": "transcribe", "language": "et"})
70
+ print(result)
71
+ ```
72
+
73
+ ## Evaluation results
74
+
75
+ TODO