File size: 3,053 Bytes
873a17c
128cf5b
973f2a4
 
873a17c
973f2a4
128cf5b
973f2a4
873a17c
973f2a4
 
 
 
873a17c
ed37d85
973f2a4
128cf5b
 
 
973f2a4
128cf5b
 
 
973f2a4
128cf5b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b8353a4
973f2a4
873a17c
7c41b61
873a17c
ea89a85
 
7c41b61
 
973f2a4
4505ee4
 
973f2a4
 
873a17c
 
 
 
ea89a85
873a17c
 
 
 
 
 
 
ea89a85
 
873a17c
 
 
 
4505ee4
 
ea89a85
 
 
 
 
 
873a17c
 
 
 
d49441f
873a17c
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
language:
- ur
license: apache-2.0
tags:
- automatic-speech-recognition
- hf-asr-leaderboard
- robust-speech-event
datasets:
- mozilla-foundation/common_voice_7_0
metrics:
- wer
- cer
model-index:
- name: wav2vec2-60-urdu
  results:
  - task:
      type: automatic-speech-recognition
      name: Speech Recognition
    dataset:
      type: mozilla-foundation/common_voice_7_0
      name: Common Voice ur
      args: ur
    metrics:
    - type: wer
      value: 59.1
      name: Test WER
      args:
      - learning_rate: 0.0003
      - train_batch_size: 16
      - eval_batch_size: 8
      - seed: 42
      - gradient_accumulation_steps: 2
      - total_train_batch_size: 32
      - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
      - lr_scheduler_type: linear
      - lr_scheduler_warmup_steps: 200
      - num_epochs: 50
      - mixed_precision_training: Native AMP
    - type: cer
      value: 33.1
      name: Test CER
      args:
      - learning_rate: 0.0003
      - train_batch_size: 16
      - eval_batch_size: 8
      - seed: 42
      - gradient_accumulation_steps: 2
      - total_train_batch_size: 32
      - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
      - lr_scheduler_type: linear
      - lr_scheduler_warmup_steps: 200
      - num_epochs: 50
      - mixed_precision_training: Native AMP
---
# wav2vec2-large-xlsr-53-urdu

This model is a fine-tuned version of [Harveenchadha/vakyansh-wav2vec2-urdu-urm-60](https://huggingface.co./Harveenchadha/vakyansh-wav2vec2-urdu-urm-60) on the common_voice dataset.
It achieves the following results on the evaluation set:
- Wer: 0.5913
- Cer: 0.3310

## Model description
The training and valid dataset is 0.58 hours. It was hard to train any model on lower number of so I decided to take vakyansh-wav2vec2-urdu-urm-60 checkpoint and finetune the wav2vec2 model.  

## Training procedure
Trained on Harveenchadha/vakyansh-wav2vec2-urdu-urm-60 due to lesser number of samples.


### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 200
- num_epochs: 50
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Wer    | Cer    |
|:-------------:|:-----:|:----:|:---------------:|:------:|:------:|
| 12.6045       | 8.33  | 100  | 8.4997          | 0.6978 | 0.3923 |
| 1.3367        | 16.67 | 200  | 5.0015          | 0.6515 | 0.3556 |
| 0.5344        | 25.0  | 300  | 9.3687          | 0.6393 | 0.3625 |
| 0.2922        | 33.33 | 400  | 9.2381          | 0.6236 | 0.3432 |
| 0.1867        | 41.67 | 500  | 6.2150          | 0.6035 | 0.3448 |
| 0.1166        | 50.0  | 600  | 6.4496          | 0.5913 | 0.3310 |


### Framework versions

- Transformers 4.15.0
- Pytorch 1.10.0+cu111
- Datasets 1.17.0
- Tokenizers 0.10.3