KaifengGGG's picture
Update README.md
7facdc7 verified
---
language:
- zh
- en
license: llama3
datasets:
- KaifengGGG/WenYanWen_English_Parallel
metrics:
- bleu
- chrf
- meteor
- bertscore
---
# Model Summary
Hanscripter is an instruction-tuned language model focused on translation classical Chinese (i.e WenYanwen 文言文) to English.
Our [Github repo](https://github.com/Kaifeng-Gao/HanScripter).
- Base Model: Meta-Llama-3-8B-Instruct
- SFT Dataset: KaifengGGG/WenYanWen_English_Parallel
- Fine-tune Method: QLoRA
# Version
# Usage
# Fine-tuning Details
Below are detailed descriptions of the various parameters and technologies used.
## LoRA Parameters
- **lora_r**: 64
- **lora_alpha**: 16
- **lora_dropout**: 0.1
## Quantization
The model uses Bitsandbytes for state-of-the-art model quantization, enhancing computational efficiency:
- **use_4bit**: `True` - Enables the use of 4-bit quantization.
- **bnb_4bit_compute_dtype**: "float16" - The datatype used for computation in quantized state.
- **bnb_4bit_quant_type**: "nf4" - Specifies the quantization type.
- **use_nested_quant**: `False` - Nested quantization is not used.
## Training Arguments
Settings for training the model are as follows:
- **num_train_epochs**: 10
- **fp16**: `False`
- **bf16**: `True` - Optimized for use with A100 GPUs, employing Brain Floating Point (bf16).
- **per_device_train_batch_size**: 2
- **per_device_eval_batch_size**: 2
- **gradient_accumulation_steps**: 4
- **gradient_checkpointing**: `True`
- **max_grad_norm**: 0.3
- **learning_rate**: 0.0002
- **weight_decay**: 0.001
- **optim**: "paged_adamw_32bit"
- **lr_scheduler_type**: "cosine"
- **max_steps**: -1
- **warmup_ratio**: 0.03
- **group_by_length**: `True`