bert-base-multilingual-cased-finetuned-yiddish-experiment-3
This model is a fine-tuned version of bert-base-multilingual-cased on the None dataset. It achieves the following results on the evaluation set:
- Loss: 1.4254
Model description
More information needed
Intended uses & limitations
Intended for use with a chatbot to correct raw Yiddish machine transcriptions, which have been generated by Transkribus
Training and evaluation data
Training dataset = Gavin model fine tuning_lines.csv
Training procedure
The training process described in Experiment 3 focuses on fine-tuning the pre-trained mBERT (multilingual BERT) model for improving raw handwritten text recognition (HTR). The fine-tuning dataset consists of raw HTR outputs paired with their human-corrected ground truth, as indicated in the line.csv file.
Key Parameters and Rationale:
1. Model Selection: The use of bert-base-multilingual-cased leverages the multilingual capabilities of BERT to accommodate the linguistic diversity likely present in the handwritten text dataset. This choice aligns well with the need to handle potentially mixed-language inputs or varying character distributions.
2. Data Handling:
- The dataset is loaded and structured into columns for raw HTR text and its hand-corrected counterpart.
- Tokenization is performed using the mBERT tokenizer, with a maximum sequence length of 64 tokens. This length balances capturing sufficient context while preventing memory overhead.
3. Training Configuration:
Batch Size and Gradient Accumulation: A batch size of 4 with a gradient accumulation step of 1 is chosen, likely due to the memory limitations of the L4 GPU, ensuring stable training while processing smaller data chunks.
Learning Rate and Weight Decay: A low learning rate of 5e-6 allows for gradual updates to the pre-trained weights, preserving the pre-trained linguistic knowledge while adapting to the new task. Weight decay is set to 0 to avoid penalizing model parameters unnecessarily for this specific task.
Gradient Clipping: The maximum gradient norm of 1 prevents exploding gradients, which could destabilize training given the small batch size and high learning rate sensitivity.
Warm-Up Steps: 300 warm-up steps allow the optimizer to start with smaller updates, reducing initial instability.
Epochs and Logging: The model is trained for 10 epochs with evaluation loss logged every 100 steps, providing a balance between sufficient training time and monitoring. Compute Setup:
The process was executed on an L4 GPU, which is optimized for such NLP workloads, providing efficient computation and faster training iterations.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 300
- num_epochs: 10
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
11.143 | 0.2364 | 100 | 7.6591 |
4.1737 | 0.4728 | 200 | 2.2642 |
2.0579 | 0.7092 | 300 | 1.7710 |
1.6963 | 0.9456 | 400 | 1.6712 |
1.5705 | 1.1820 | 500 | 1.6379 |
1.5353 | 1.4184 | 600 | 1.6003 |
1.5213 | 1.6548 | 700 | 1.5273 |
1.4387 | 1.8913 | 800 | 1.5415 |
1.3973 | 2.1277 | 900 | 1.5530 |
1.4266 | 2.3641 | 1000 | 1.5328 |
1.3365 | 2.6005 | 1100 | 1.5154 |
1.4423 | 2.8369 | 1200 | 1.4662 |
1.3948 | 3.0733 | 1300 | 1.5041 |
1.3244 | 3.3097 | 1400 | 1.4530 |
1.3645 | 3.5461 | 1500 | 1.4656 |
1.329 | 3.7825 | 1600 | 1.4542 |
1.3326 | 4.0189 | 1700 | 1.5293 |
1.2768 | 4.2553 | 1800 | 1.4575 |
1.3125 | 4.4917 | 1900 | 1.4638 |
1.2925 | 4.7281 | 2000 | 1.4867 |
1.281 | 4.9645 | 2100 | 1.4827 |
1.2966 | 5.2009 | 2200 | 1.4359 |
1.28 | 5.4374 | 2300 | 1.4761 |
1.2436 | 5.6738 | 2400 | 1.5006 |
1.2787 | 5.9102 | 2500 | 1.4511 |
1.2344 | 6.1466 | 2600 | 1.4430 |
1.199 | 6.3830 | 2700 | 1.4254 |
1.2899 | 6.6194 | 2800 | 1.4339 |
1.2637 | 6.8558 | 2900 | 1.4609 |
1.2186 | 7.0922 | 3000 | 1.4300 |
1.181 | 7.3286 | 3100 | 1.4407 |
1.2815 | 7.5650 | 3200 | 1.4471 |
1.2161 | 7.8014 | 3300 | 1.4413 |
1.1562 | 8.0378 | 3400 | 1.4695 |
1.1668 | 8.2742 | 3500 | 1.4940 |
1.2557 | 8.5106 | 3600 | 1.4430 |
1.1985 | 8.7470 | 3700 | 1.4562 |
1.2051 | 8.9835 | 3800 | 1.4412 |
1.1588 | 9.2199 | 3900 | 1.4421 |
1.2002 | 9.4563 | 4000 | 1.4477 |
1.2339 | 9.6927 | 4100 | 1.4573 |
1.1918 | 9.9291 | 4200 | 1.4463 |
Framework versions
- Transformers 4.47.0
- Pytorch 2.5.1+cu121
- Datasets 3.1.0
- Tokenizers 0.21.0
- Downloads last month
- 2
Model tree for MarineLives/mBert-finetuned-yiddish-experiment-3
Base model
google-bert/bert-base-multilingual-cased