|
--- |
|
language: |
|
- zh |
|
- en |
|
license: llama3 |
|
datasets: |
|
- KaifengGGG/WenYanWen_English_Parallel |
|
metrics: |
|
- bleu |
|
- chrf |
|
- meteor |
|
- bertscore |
|
--- |
|
|
|
# Model Summary |
|
|
|
Hanscripter is an instruction-tuned language model focused on translation classical Chinese (i.e WenYanwen 文言文) to English. |
|
Our [Github repo](https://github.com/Kaifeng-Gao/HanScripter). |
|
|
|
- Base Model: Meta-Llama-3-8B-Instruct |
|
- SFT Dataset: KaifengGGG/WenYanWen_English_Parallel |
|
- Fine-tune Method: QLoRA |
|
|
|
# Version |
|
|
|
# Usage |
|
|
|
# Fine-tuning Details |
|
Below are detailed descriptions of the various parameters and technologies used. |
|
|
|
## LoRA Parameters |
|
- **lora_r**: 64 |
|
- **lora_alpha**: 16 |
|
- **lora_dropout**: 0.1 |
|
|
|
## Quantization |
|
The model uses Bitsandbytes for state-of-the-art model quantization, enhancing computational efficiency: |
|
- **use_4bit**: `True` - Enables the use of 4-bit quantization. |
|
- **bnb_4bit_compute_dtype**: "float16" - The datatype used for computation in quantized state. |
|
- **bnb_4bit_quant_type**: "nf4" - Specifies the quantization type. |
|
- **use_nested_quant**: `False` - Nested quantization is not used. |
|
|
|
## Training Arguments |
|
Settings for training the model are as follows: |
|
- **num_train_epochs**: 10 |
|
- **fp16**: `False` |
|
- **bf16**: `True` - Optimized for use with A100 GPUs, employing Brain Floating Point (bf16). |
|
- **per_device_train_batch_size**: 2 |
|
- **per_device_eval_batch_size**: 2 |
|
- **gradient_accumulation_steps**: 4 |
|
- **gradient_checkpointing**: `True` |
|
- **max_grad_norm**: 0.3 |
|
- **learning_rate**: 0.0002 |
|
- **weight_decay**: 0.001 |
|
- **optim**: "paged_adamw_32bit" |
|
- **lr_scheduler_type**: "cosine" |
|
- **max_steps**: -1 |
|
- **warmup_ratio**: 0.03 |
|
- **group_by_length**: `True` |
|
|
|
|
|
|