--- language: - zh - en license: llama3 datasets: - KaifengGGG/WenYanWen_English_Parallel metrics: - bleu - chrf - meteor - bertscore --- # Model Summary Hanscripter is an instruction-tuned language model focused on translation classical Chinese (i.e WenYanwen 文言文) to English. Our [Github repo](https://github.com/Kaifeng-Gao/HanScripter). - Base Model: Meta-Llama-3-8B-Instruct - SFT Dataset: KaifengGGG/WenYanWen_English_Parallel - Fine-tune Method: QLoRA # Version # Usage # Fine-tuning Details Below are detailed descriptions of the various parameters and technologies used. ## LoRA Parameters - **lora_r**: 64 - **lora_alpha**: 16 - **lora_dropout**: 0.1 ## Quantization The model uses Bitsandbytes for state-of-the-art model quantization, enhancing computational efficiency: - **use_4bit**: `True` - Enables the use of 4-bit quantization. - **bnb_4bit_compute_dtype**: "float16" - The datatype used for computation in quantized state. - **bnb_4bit_quant_type**: "nf4" - Specifies the quantization type. - **use_nested_quant**: `False` - Nested quantization is not used. ## Training Arguments Settings for training the model are as follows: - **num_train_epochs**: 10 - **fp16**: `False` - **bf16**: `True` - Optimized for use with A100 GPUs, employing Brain Floating Point (bf16). - **per_device_train_batch_size**: 2 - **per_device_eval_batch_size**: 2 - **gradient_accumulation_steps**: 4 - **gradient_checkpointing**: `True` - **max_grad_norm**: 0.3 - **learning_rate**: 0.0002 - **weight_decay**: 0.001 - **optim**: "paged_adamw_32bit" - **lr_scheduler_type**: "cosine" - **max_steps**: -1 - **warmup_ratio**: 0.03 - **group_by_length**: `True`