KaifengGGG
/

Llama3-8b-Hanscripter

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama3-8b-Hanscripter / README.md

KaifengGGG's picture

Update README.md

7facdc7 verified 9 months ago

|

history blame contribute delete

1.68 kB

	---
	language:
	- zh
	- en
	license: llama3
	datasets:
	- KaifengGGG/WenYanWen_English_Parallel
	metrics:
	- bleu
	- chrf
	- meteor
	- bertscore
	---

	# Model Summary

	Hanscripter is an instruction-tuned language model focused on translation classical Chinese (i.e WenYanwen 文言文) to English.
	Our [Github repo](https://github.com/Kaifeng-Gao/HanScripter).

	- Base Model: Meta-Llama-3-8B-Instruct
	- SFT Dataset: KaifengGGG/WenYanWen_English_Parallel
	- Fine-tune Method: QLoRA

	# Version

	# Usage

	# Fine-tuning Details
	Below are detailed descriptions of the various parameters and technologies used.

	## LoRA Parameters
	- lora_r: 64
	- lora_alpha: 16
	- lora_dropout: 0.1

	## Quantization
	The model uses Bitsandbytes for state-of-the-art model quantization, enhancing computational efficiency:
	- use_4bit: `True` - Enables the use of 4-bit quantization.
	- bnb_4bit_compute_dtype: "float16" - The datatype used for computation in quantized state.
	- bnb_4bit_quant_type: "nf4" - Specifies the quantization type.
	- use_nested_quant: `False` - Nested quantization is not used.

	## Training Arguments
	Settings for training the model are as follows:
	- num_train_epochs: 10
	- fp16: `False`
	- bf16: `True` - Optimized for use with A100 GPUs, employing Brain Floating Point (bf16).
	- per_device_train_batch_size: 2
	- per_device_eval_batch_size: 2
	- gradient_accumulation_steps: 4
	- gradient_checkpointing: `True`
	- max_grad_norm: 0.3
	- learning_rate: 0.0002
	- weight_decay: 0.001
	- optim: "paged_adamw_32bit"
	- lr_scheduler_type: "cosine"
	- max_steps: -1
	- warmup_ratio: 0.03
	- group_by_length: `True`