GetmanY1 commited on
Commit
7fc5bcb
·
verified ·
1 Parent(s): 551135b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +154 -3
README.md CHANGED
@@ -1,3 +1,154 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - automatic-speech-recognition
5
+ - fi
6
+ - finnish
7
+ library_name: transformers
8
+ language: fi
9
+ base_model:
10
+ - GetmanY1/wav2vec2-large-fi-150k
11
+ model-index:
12
+ - name: wav2vec2-large-fi-150k-finetuned
13
+ results:
14
+ - task:
15
+ name: Automatic Speech Recognition
16
+ type: automatic-speech-recognition
17
+ dataset:
18
+ name: Lahjoita puhetta (Donate Speech)
19
+ type: lahjoita-puhetta
20
+ args: fi
21
+ metrics:
22
+ - name: Dev WER
23
+ type: wer
24
+ value: 15.34
25
+ - name: Dev CER
26
+ type: cer
27
+ value: 4.14
28
+ - name: Test WER
29
+ type: wer
30
+ value: 16.86
31
+ - name: Test CER
32
+ type: cer
33
+ value: 5.07
34
+ - task:
35
+ name: Automatic Speech Recognition
36
+ type: automatic-speech-recognition
37
+ dataset:
38
+ name: Finnish Parliament
39
+ type: FinParl
40
+ args: fi
41
+ metrics:
42
+ - name: Dev16 WER
43
+ type: wer
44
+ value: 11.3
45
+ - name: Dev16 CER
46
+ type: cer
47
+ value: 4.75
48
+ - name: Test16 WER
49
+ type: wer
50
+ value: 8.29
51
+ - name: Test16 CER
52
+ type: cer
53
+ value: 3.34
54
+ - name: Test20 WER
55
+ type: wer
56
+ value: 6.94
57
+ - name: Test20 CER
58
+ type: cer
59
+ value: 2.15
60
+ - task:
61
+ name: Automatic Speech Recognition
62
+ type: automatic-speech-recognition
63
+ dataset:
64
+ name: Common Voice 16.1
65
+ type: mozilla-foundation/common_voice_16_1
66
+ args: fi
67
+ metrics:
68
+ - name: Dev WER
69
+ type: wer
70
+ value: 7.17
71
+ - name: Dev CER
72
+ type: cer
73
+ value: 1.11
74
+ - name: Test WER
75
+ type: wer
76
+ value: 5.86
77
+ - name: Test CER
78
+ type: cer
79
+ value: 0.91
80
+ - task:
81
+ name: Automatic Speech Recognition
82
+ type: automatic-speech-recognition
83
+ dataset:
84
+ name: FLEURS
85
+ type: google/fleurs
86
+ args: fi_fi
87
+ metrics:
88
+ - name: Dev WER
89
+ type: wer
90
+ value: 9.2
91
+ - name: Dev CER
92
+ type: cer
93
+ value: 5.23
94
+ - name: Test WER
95
+ type: wer
96
+ value: 10.69
97
+ - name: Test CER
98
+ type: cer
99
+ value: 5.79
100
+ ---
101
+
102
+ # Finnish Wav2vec2-Large ASR
103
+
104
+ [GetmanY1/wav2vec2-large-fi-150k](https://huggingface.co/GetmanY1/wav2vec2-large-fi-150k) fine-tuned on 4600 hours of Finnish speech on 16kHz sampled speech audio:
105
+ * 1500 hours of [Lahjoita puhetta (Donate Speech)](https://link.springer.com/article/10.1007/s10579-022-09606-3) (colloquial Finnish)
106
+ * 3100 hours of the [Finnish Parliament dataset](https://link.springer.com/article/10.1007/s10579-023-09650-7)
107
+
108
+ When using the model make sure that your speech input is also sampled at 16Khz.
109
+
110
+ ## Model description
111
+
112
+ The Finnish Wav2Vec2 Large has the same architecture and uses the same training objective as the English and multilingual one described in [Paper](https://arxiv.org/abs/2006.11477).
113
+
114
+ [GetmanY1/wav2vec2-large-fi-150k](https://huggingface.co/GetmanY1/wav2vec2-large-fi-150k) is a large-scale, 317-million parameter monolingual model pre-trained on 158k hours of unlabeled Finnish speech, including [KAVI radio and television archive materials](https://kavi.fi/en/radio-ja-televisioarkistointia-vuodesta-2008/), Lahjoita puhetta (Donate Speech), Finnish Parliament, Finnish VoxPopuli.
115
+
116
+ You can read more about the pre-trained model from [this paper](TODO). The training scripts are available on [GitHub](https://github.com/aalto-speech/large-scale-monolingual-speech-foundation-models).
117
+
118
+ ## Intended uses
119
+
120
+ You can use this model for Finnish ASR (speech-to-text).
121
+
122
+ ### How to use
123
+
124
+ To transcribe audio files the model can be used as a standalone acoustic model as follows:
125
+
126
+ ```
127
+ from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
128
+ from datasets import load_dataset
129
+ import torch
130
+
131
+ # load model and processor
132
+ processor = Wav2Vec2Processor.from_pretrained("GetmanY1/wav2vec2-large-fi-150k-finetuned")
133
+ model = Wav2Vec2ForCTC.from_pretrained("GetmanY1/wav2vec2-large-fi-150k-finetuned")
134
+
135
+ # load dummy dataset and read soundfiles
136
+ ds = load_dataset("mozilla-foundation/common_voice_16_1", "fi", split='test')
137
+
138
+ # tokenize
139
+ input_values = processor(ds[0]["audio"]["array"], return_tensors="pt", padding="longest").input_values # Batch size 1
140
+
141
+ # retrieve logits
142
+ logits = model(input_values).logits
143
+
144
+ # take argmax and decode
145
+ predicted_ids = torch.argmax(logits, dim=-1)
146
+ transcription = processor.batch_decode(predicted_ids)
147
+ ```
148
+
149
+ ## Team Members
150
+
151
+ - Yaroslav Getman, [Hugging Face profile](https://huggingface.co/GetmanY1), [LinkedIn profile](https://www.linkedin.com/in/yaroslav-getman/)
152
+ - Tamas Grosz, [Hugging Face profile](https://huggingface.co/Grosy), [LinkedIn profile](https://www.linkedin.com/in/tam%C3%A1s-gr%C3%B3sz-950a049a/)
153
+
154
+ Feel free to contact us for more details 🤗