wav2vec2-large-xlsr-53-common_voice-ja-demo-kana-only

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the MOZILLA-FOUNDATION/COMMON_VOICE_13_0 - JA dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6985
  • Wer: 0.9998
  • Cer: 0.3126

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 15.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Cer
No log 0.2660 100 6.8309 1.0 0.9999
No log 0.5319 200 4.1299 1.0 0.9999
No log 0.7979 300 3.9930 1.0 0.9869
No log 1.0638 400 2.0400 1.0 0.5876
7.1041 1.3298 500 1.0610 1.0 0.4309
7.1041 1.5957 600 0.8837 1.0 0.3955
7.1041 1.8617 700 0.7706 0.9998 0.3791
7.1041 2.1277 800 0.7662 1.0 0.3816
7.1041 2.3936 900 0.7621 1.0 0.3790
0.803 2.6596 1000 0.6969 1.0 0.3626
0.803 2.9255 1100 0.6736 1.0 0.3573
0.803 3.1915 1200 0.6823 0.9998 0.3544
0.803 3.4574 1300 0.6360 1.0 0.3460
0.803 3.7234 1400 0.6504 1.0 0.3443
0.5675 3.9894 1500 0.6247 1.0 0.3414
0.5675 4.2553 1600 0.6397 0.9998 0.3425
0.5675 4.5213 1700 0.6589 1.0 0.3439
0.5675 4.7872 1800 0.6345 1.0 0.3449
0.5675 5.0532 1900 0.6522 0.9996 0.3380
0.4421 5.3191 2000 0.6293 1.0 0.3372
0.4421 5.5851 2100 0.6096 1.0 0.3342
0.4421 5.8511 2200 0.6108 1.0 0.3321
0.4421 6.1170 2300 0.6200 1.0 0.3354
0.4421 6.3830 2400 0.6413 1.0 0.3341
0.3699 6.6489 2500 0.6303 0.9996 0.3359
0.3699 6.9149 2600 0.6013 1.0 0.3308
0.3699 7.1809 2700 0.6343 1.0 0.3286
0.3699 7.4468 2800 0.6208 0.9998 0.3260
0.3699 7.7128 2900 0.6095 0.9998 0.3287
0.3146 7.9787 3000 0.6058 0.9996 0.3266
0.3146 8.2447 3100 0.6613 0.9996 0.3251
0.3146 8.5106 3200 0.6539 1.0 0.3244
0.3146 8.7766 3300 0.6331 1.0 0.3264
0.3146 9.0426 3400 0.6436 1.0 0.3228
0.2576 9.3085 3500 0.6329 1.0 0.3235
0.2576 9.5745 3600 0.6315 0.9998 0.3197
0.2576 9.8404 3700 0.6281 0.9998 0.3203
0.2576 10.1064 3800 0.6696 0.9996 0.3196
0.2576 10.3723 3900 0.6630 0.9996 0.3199
0.2201 10.6383 4000 0.6781 1.0 0.3203
0.2201 10.9043 4100 0.6531 1.0 0.3196
0.2201 11.1702 4200 0.6763 0.9998 0.3193
0.2201 11.4362 4300 0.6785 1.0 0.3184
0.2201 11.7021 4400 0.6664 0.9998 0.3179
0.1931 11.9681 4500 0.6682 0.9998 0.3184
0.1931 12.2340 4600 0.6800 0.9998 0.3168
0.1931 12.5 4700 0.6925 1.0 0.3162
0.1931 12.7660 4800 0.7047 1.0 0.3145
0.1931 13.0319 4900 0.6919 0.9998 0.3147
0.1694 13.2979 5000 0.6999 0.9998 0.3142
0.1694 13.5638 5100 0.6995 1.0 0.3134
0.1694 13.8298 5200 0.6917 0.9998 0.3134
0.1694 14.0957 5300 0.6963 0.9998 0.3129
0.1694 14.3617 5400 0.6961 0.9998 0.3128
0.1548 14.6277 5500 0.6964 1.0 0.3129
0.1548 14.8936 5600 0.6984 0.9998 0.3127

Framework versions

  • Transformers 4.47.0.dev0
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
27
Safetensors
Model size
316M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for utakumi/wav2vec2-large-xlsr-53-common_voice-ja-demo-kana-only

Finetuned
(219)
this model

Evaluation results