p208p2002
/

llama-3-zhtw-8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Llama 3 zhtw

在 Llama 3 上試驗中文 Continue Pretraining (CP)，共計訓練 800M tokens。

由於中文預訓練語料品質還有改進空間，CP 後表現未能超越原版 Llama 3，我們比較幾個開源社群訓練的中文 Llama 3 也有類似狀況。

在英文方面 LLaMA 3 zhtw 使用 FineWeb，使得 MMLU 表現高於其他中文CP模型，能力與原版 LLaMA 3 持平。

Benchmarks

Models		↑ TMMLU+ (ACC)	CMMLU (ACC)	MMLU (ACC)
		TC, Knowledge	CN, Knowledge	EN, Knowledge
		5 shot	5 shot	5 shot
Yi-6B	6B	49.63	75.53	65.35
Qwen-7B	7B	42.84	73.1	61.00
Meta-Llama-3-8B	8B	41.97	50.8	65.17
p208p2002/llama-3-zhtw-8B	8B	41.84	50.6	65.31
Breeze-7B-Base-v0_1	7B	40.35	44.05	61.63
hfl/llama-3-chinese-8b	8B	39.64	50.9	61.1

Recipe

Datasets

Dataset	Lang	Weight
FineWeb	en	0.35
Wudao	zh-cn	0.1
C4Tw	zh-tw	0.1
WikiZhTw	zh-tw	0.15
NdltdT10	zh-tw	0.1
GitHubMarkDown	code	0.1
GitHubPython	code	0.1

Hyper Parameters

Learning Rate: 1e-7
Global Batch Size: 60
Sequence Length: 8192

Downloads last month: 95

Safetensors

Model size

8.03B params

Tensor type

BF16

·

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for p208p2002/llama-3-zhtw-8B

Quantizations

1 model

Datasets used to train p208p2002/llama-3-zhtw-8B

Spaces using p208p2002/llama-3-zhtw-8B 6

Collection including p208p2002/llama-3-zhtw-8B

LLaMA-zhtw

6 items • Updated Jun 11, 2024