File size: 3,404 Bytes
873a17c ed7e4ac b8353a4 ed7e4ac 873a17c ed7e4ac 873a17c a0d7b6c ed7e4ac 873a17c ed37d85 ed7e4ac ed37d85 ed7e4ac b8353a4 ed7e4ac b8353a4 873a17c 7c41b61 873a17c 90342ae 7c41b61 ed7e4ac 873a17c ed7e4ac 873a17c d49441f 90342ae 873a17c ed7e4ac 873a17c d49441f 873a17c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
---
language:
- ur
license: apache-2.0
tags:
- automatic-speech-recognition
- robust-speech-event
datasets:
- common_voice_7
metrics:
- wer
- cer
model-index:
- name: wav2vec2-60-urdu
results:
- task:
type: automatic-speech-recognition # Required. Example: automatic-speech-recognition
name: Urdu Speech Recognition # Optional. Example: Speech Recognition
dataset:
type: common_voice_7 # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
name: Urdu # Required. Example: Common Voice zh-CN
args: ur # Optional. Example: zh-CN
metrics:
- type: wer # Required. Example: wer
value: 59.2 # Required. Example: 20.90
name: Test WER # Optional. Example: Test WER
args:
- learning_rate: 0.0003
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 200
- num_epochs: 50
- mixed_precision_training: Native AMP # Optional. Example for BLEU: max_order
- type: cer # Required. Example: wer
value: 32.9 # Required. Example: 20.90
name: Test CER # Optional. Example: Test WER
args:
- learning_rate: 0.0003
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 200
- num_epochs: 50
- mixed_precision_training: Native AMP # Optional. Example for BLEU: max_order
---
# wav2vec2-large-xlsr-53-urdu
This model is a fine-tuned version of [Harveenchadha/vakyansh-wav2vec2-urdu-urm-60](https://huggingface.co./Harveenchadha/vakyansh-wav2vec2-urdu-urm-60) on the common_voice dataset.
It achieves the following results on the evaluation set:
- Wer: 0.5921
- Cer: 0.3288
## Model description
The training and valid dataset is 0.58 hours. It was hard to train any model on lower number of so I decided to take Urdu-60 checkpoint and finetune the wav2vwc2 model.
## Training procedure
Trained on Harveenchadha/vakyansh-wav2vec2-urdu-urm-60 due to lesser number of samples.
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0003
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 200
- num_epochs: 50
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Wer | Cer |
|:-------------:|:-----:|:----:|:------:|:------:|
| 13.83 | 8.33 | 100 | 0.6611 | 0.3639 |
| 1.0144 | 16.67 | 200 | 0.6498 | 0.3731 |
| 0.5801 | 25.0 | 300 | 0.6454 | 0.3767 |
| 0.3344 | 33.33 | 400 | 0.6349 | 0.3548 |
| 0.1606 | 41.67 | 500 | 0.6105 | 0.3348 |
| 0.0974 | 50.0 | 600 | 0.5921 | 0.3288 |
### Framework versions
- Transformers 4.15.0
- Pytorch 1.10.0+cu111
- Datasets 1.17.0
- Tokenizers 0.10.3
|