Update README.md

6d79062 verified 2 months ago

11.9 kB

	---
	license: apache-2.0
	language:
	- en
	- de
	- es
	- fr
	- it
	- pt
	- pl
	- nl
	- tr
	- sv
	- cs
	- el
	- hu
	- ro
	- fi
	- uk
	- sl
	- sk
	- da
	- lt
	- lv
	- et
	- bg
	- no
	- ca
	- hr
	- ga
	- mt
	- gl
	- zh
	- ru
	- ko
	- ja
	- ar
	- hi
	---
	# Model Card for EuroLLM-1.7B-Instruct


	This is the model card for the first instruction tuned model of the EuroLLM series: EuroLLM-1.7B-Instruct. You can also check the pre-trained version: [EuroLLM-1.7B](https://huggingface.co./utter-project/EuroLLM-1.7B).

	- Developed by: Unbabel, Instituto Superior Técnico, University of Edinburgh, Aveni, University of Paris-Saclay, University of Amsterdam, Naver Labs, Sorbonne Université.
	- Funded by: European Union.
	- Model type: A 1.7B parameter instruction tuned multilingual transfomer LLM.
	- Language(s) (NLP): Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian.
	- License: Apache License 2.0.

	## Model Details

	The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages.
	EuroLLM-1.7B is a 1.7B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets.
	EuroLLM-1.7B-Instruct was further instruction tuned on EuroBlocks, an instruction tuning dataset with focus on general instruction-following and machine translation.


	### Model Description

	EuroLLM uses a standard, dense Transformer architecture:
	- We use grouped query attention (GQA) with 8 key-value heads, since it has been shown to increase speed at inference time while maintaining downstream performance.
	- We perform pre-layer normalization, since it improves the training stability, and use the RMSNorm, which is faster.
	- We use the SwiGLU activation function, since it has been shown to lead to good results on downstream tasks.
	- We use rotary positional embeddings (RoPE) in every layer, since these have been shown to lead to good performances while allowing the extension of the context length.

	For pre-training, we use 256 Nvidia H100 GPUs of the Marenostrum 5 supercomputer, training the model with a constant batch size of 3,072 sequences, which corresponds to approximately 12 million tokens, using the Adam optimizer, and BF16 precision.
	Here is a summary of the model hyper-parameters:
	\| \| \|
	\|--------------------------------------\|----------------------\|
	\| Sequence Length \| 4,096 \|
	\| Number of Layers \| 24 \|
	\| Embedding Size \| 2,048 \|
	\| FFN Hidden Size \| 5,632 \|
	\| Number of Heads \| 16 \|
	\| Number of KV Heads (GQA) \| 8 \|
	\| Activation Function \| SwiGLU \|
	\| Position Encodings \| RoPE (\Theta=10,000) \|
	\| Layer Norm \| RMSNorm \|
	\| Tied Embeddings \| No \|
	\| Embedding Parameters \| 0.262B \|
	\| LM Head Parameters \| 0.262B \|
	\| Non-embedding Parameters \| 1.133B \|
	\| Total Parameters \| 1.657B \|

	## Run the model

	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "utter-project/EuroLLM-1.7B-Instruct"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id)

	text = '<\|im_start\|>system\n<\|im_end\|>\n<\|im_start\|>user\nTranslate the following English source text to Portuguese:\nEnglish: I am a language model for european languages. \nPortuguese: <\|im_end\|>\n<\|im_start\|>assistant\n'

	inputs = tokenizer(text, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=20)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))



	## Results

	### Machine Translation

	We evaluate EuroLLM-1.7B-Instruct on several machine translation benchmarks: FLORES-200, WMT-23, and WMT-24 comparing it with [Gemma-2B](https://huggingface.co./google/gemma-2b) and [Gemma-7B](https://huggingface.co./google/gemma-7b) (also instruction tuned on EuroBlocks).
	The results show that EuroLLM-1.7B is substantially better than Gemma-2B in Machine Translation and competitive with Gemma-7B.

	#### Flores-200
	\| Model \| AVG \| AVG en-xx \| AVG xx-en \| en-ar \| en-bg \| en-ca \| en-cs \| en-da \| en-de \| en-el \| en-es-latam \| en-et \| en-fi \| en-fr \| en-ga \| en-gl \| en-hi \| en-hr \| en-hu \| en-it \| en-ja \| en-ko \| en-lt \| en-lv \| en-mt \| en-nl \| en-no \| en-pl \| en-pt-br \| en-ro \| en-ru \| en-sk \| en-sl \| en-sv \| en-tr \| en-uk \| en-zh-cn \| ar-en \| bg-en \| ca-en \| cs-en \| da-en \| de-en \| el-en \| es-latam-en \| et-en \| fi-en \| fr-en \| ga-en \| gl-en \| hi-en \| hr-en \| hu-en \| it-en \| ja-en \| ko-en \| lt-en \| lv-en \| mt-en \| nl-en \| no-en \| pl-en \| pt-br-en \| ro-en \| ru-en \| sk-en \| sl-en \| sv-en \| tr-en \| uk-en \| zh-cn-en \|
	\|--------------------------------\|------\|-----------\|-----------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|--------------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|----------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|----------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|--------------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|----------\|-------\|-------\|-------\|-------\|-------\|-------\|-------\|----------\|
	\| EuroLLM-1.7B-Instruct \| 86.75\| 86.49\| 87.01\|85.00\| 89.37\| 84.65\| 89.23\| 89.54\| 87.00\| 87.68\| 86.43\| 88.66\| 89.18\| 87.65\| 74.66\| 86.68\| 76.94\| 84.86\| 86.43\| 88.19\| 89.45\| 87.33\| 87.73\| 87.74\| 67.66\| 87.16\| 90.08\| 88.10\| 89.39\| 89.39\| 88.11\| 88.13\| 87.08\| 89.67\| 87.21\| 87.76\| 86.45\| 86.15\| 87.46\| 87.49\| 87.97\| 89.65\| 88.80\| 86.88\| 86.70\| 88.19\| 88.66\| 88.93\| 81.95\| 87.38\| 87.97\| 86.69\| 87.15\| 87.69\| 86.78\| 86.76\| 85.39\| 86.23\| 76.93\| 87.02\| 90.24\| 85.54\| 89.22\| 88.81\| 86.02\| 87.24\| 86.81\| 89.55\| 87.76\| 86.37\| 86.00 \|
	\| Gemma-2B-EuroBlocks \| 81.56\| 78.93 \| 84.18 \| 75.25 \| 82.46 \| 83.17 \| 82.17 \| 84.40 \| 83.20 \| 79.63 \| 84.15 \| 72.63 \| 81.00 \| 85.12 \| 38.79 \| 82.00 \| 67.00 \| 81.18 \| 78.24 \| 84.80 \| 87.08 \| 82.04 \| 73.02 \| 68.41 \| 56.67 \| 83.30 \| 86.69 \| 83.07 \| 86.82 \| 84.00 \| 84.55 \| 77.93 \| 76.19 \| 80.77 \| 79.76 \| 84.19 \| 84.10 \| 83.67 \| 85.73 \| 86.89 \| 86.38 \| 88.39 \| 88.11 \| 84.68 \| 86.11 \| 83.45 \| 86.45 \| 88.22 \| 50.88 \| 86.44 \| 85.87 \| 85.33 \| 85.16 \| 86.75 \| 85.62 \| 85.00 \| 81.55 \| 81.45 \| 67.90 \| 85.95 \| 89.05 \| 84.18 \| 88.27 \| 87.38 \| 85.13 \| 85.22 \| 83.86 \| 87.83 \| 84.96 \| 85.15 \| 85.10 \|
	\| Gemma-7B-EuroBlocks \| 86.16\| 85.49 \| 86.82 \| 83.39 \| 88.32 \| 85.82 \| 88.88 \| 89.01 \| 86.96 \| 86.62 \| 86.31 \| 84.42 \| 88.11 \| 87.46 \| 61.85 \| 86.10 \| 77.91 \| 87.01 \| 85.81 \| 87.57 \| 89.88 \| 87.24 \| 84.47 \| 83.15 \| 67.13 \| 86.50 \| 90.44 \| 87.57 \| 89.22 \| 89.13 \| 88.58 \| 86.73 \| 84.68 \| 88.16 \| 86.87 \| 88.40 \| 87.11 \| 86.65 \| 87.25 \| 88.17 \| 87.47 \| 89.59 \| 88.44 \| 86.76 \| 86.66 \| 87.55 \| 88.88 \| 88.86 \| 73.46 \| 87.63 \| 88.43 \| 87.12 \| 87.31 \| 87.49 \| 87.20 \| 87.15 \| 85.16 \| 85.96 \| 78.39 \| 86.73 \| 90.52 \| 85.38 \| 89.17 \| 88.75 \| 86.35 \| 86.82 \| 86.21 \| 89.39 \| 88.20 \| 86.45 \| 86.28 \|


	#### WMT-23
	\| Model \| AVG \| AVG en-xx \| AVG xx-en \| AVG xx-xx \| en-de \| en-cs \| en-uk \| en-ru \| en-zh-cn \| de-en \| uk-en \| ru-en \| zh-cn-en \| cs-uk \|
	\|--------------------------------\|------\|-----------\|-----------\|-----------\|-------\|-------\|-------\|-------\|----------\|-------\|-------\|-------\|----------\|-------\|
	\| EuroLLM-1.7B-Instruct \| 83.13 \| 82.91 \| 82.48 \| 86.87\| 81.33\| 85.42\| 81.61\| 82.57\| 83.62\| 84.24\| 85.36\| 81.56\| 78.76\| 86.87 \|
	\| Gemma-2B-EuroBlocks \| 79.86\| 78.35 \| 81.32 \| 81.56 \| 76.54 \| 76.35 \| 77.62 \| 78.88 \| 82.36 \| 82.85 \| 83.83 \| 80.17 \| 78.42 \| 81.56 \|
	\| Gemma-7B-EuroBlocks \| 83.90\| 83.70 \| 83.21 \| 87.61 \| 82.15 \| 84.68 \| 83.05 \| 83.85 \| 84.79 \| 84.40 \| 85.86 \| 82.55 \| 80.01 \| 87.61 \|


	#### WMT-24
	\| Model \| AVG \| AVG en-xx \| AVG xx-xx \| en-es-latam \| en-cs \| en-ru \| en-uk \| en-ja \| en-zh-cn \| en-hi \| cs-uk \| ja-zh-cn \|
	\|---------\|------\|------\|-------\|-------\|-------\|-------\|--------\|--------\|-------\|-------\|-------\|-----\|
	\| EuroLLM-1.7B-Instruct\|79.35\|79.45\|78.96\|79.20\|81.17\|80.82\|79.00\|80.54\|82.39\|80.80\|71.69\|83.16\|74.76\|
	\|Gemma-2B-EuroBlocks\| 74.71\|74.25\|76.57\|75.21\|78.84\|70.40\|74.44\|75.55\|78.32\|78.70\|62.51\|79.97\|73.17\|
	\|Gemma-7B-EuroBlocks\| 80.88\|80.45\|82.60\|80.43\|81.91\|80.14\|80.32\|82.17\|84.08\|81.86\|72.71\|85.55\|79.65\|

	### General Benchmarks
	We also compare EuroLLM-1.7B with [TinyLlama-v1.1](https://huggingface.co./TinyLlama/TinyLlama_v1.1) and [Gemma-2B](https://huggingface.co./google/gemma-2b) on 3 general benchmarks: Arc Challenge and Hellaswag.
	For the non-english languages we use the [Okapi](https://aclanthology.org/2023.emnlp-demo.28.pdf) datasets.
	Results show that EuroLLM-1.7B is superior to TinyLlama-1.1-3T and similar to Gemma-2B on Hellaswag but worse on Arc Challenge. This can be due to the lower number of parameters of EuroLLM-1.7B (1.133B non-embedding parameters against 1.981B).

	#### Arc Challenge
	\| Model \| Average \| English \| German \| Spanish \| French \| Italian \| Portuguese \| Chinese \| Russian \| Dutch \| Arabic \| Swedish \| Hindi \| Hungarian \| Romanian \| Ukrainian \| Danish \| Catalan \|
	\|--------------------\|---------\|---------\|--------\|---------\|--------\|---------\|------------\|---------\|---------\|-------\|--------\|---------\|--------\|-----------\|----------\|-----------\|--------\|---------\|
	\| EuroLLM-1.7B-Instruct \| 0.3268 \| 0.3218 \| 0.4070 \| 0.3293 \| 0.3521 \| 0.3370 \| 0.3422 \| 0.3496 \| 0.3060 \| 0.3122 \| 0.3174 \| 0.2866 \| 0.3373 \| 0.2817 \| 0.3031 \| 0.3179 \| 0.3199 \| 0.3248 \| 0.3310 \|
	\| TinyLlama-v1.1 \| 0.2650 \| 0.2583 \| 0.3712 \| 0.2524 \| 0.2795 \| 0.2883 \| 0.2652 \| 0.2906 \| 0.2410 \| 0.2669 \| 0.2404 \| 0.2310 \| 0.2687 \| 0.2354 \| 0.2449 \| 0.2476 \| 0.2524 \| 0.2494 \| 0.2796 \|
	\| Gemma-2B \| 0.3617 \| 0.3540 \| 0.4846 \| 0.3755 \| 0.3940 \| 0.4080 \| 0.3687 \| 0.3872 \| 0.3726 \| 0.3456 \| 0.3328 \| 0.3122 \| 0.3519 \| 0.2851 \| 0.3039 \| 0.3590 \| 0.3601 \| 0.3565 \| 0.3516 \|
	#### Hellaswag
	\| Model \| Average \| English \| German \| Spanish \| French \| Italian \| Portuguese \| Russian \| Dutch \| Arabic \| Swedish \| Hindi \| Hungarian \| Romanian \| Ukrainian \| Danish \| Catalan \|
	\|--------------------\|---------\|---------\|--------\|---------\|--------\|---------\|------------\|---------\|--------\|--------\|---------\|--------\|-----------\|----------\|-----------\|--------\|---------\|
	\| EuroLLM-1.7B-Instruct \| 0.4744 \| 0.4654 \| 0.6084 \| 0.4772 \| 0.5310 \| 0.5260 \| 0.5067 \| 0.5206 \| 0.4674 \| 0.4893 \| 0.4075 \| 0.4813 \| 0.3605 \| 0.4067 \| 0.4598 \| 0.4368 \| 0.4700 \| 0.4405 \|
	\| TinyLlama-v1.1 \|0.3674 \| 0.3503 \| 0.6248 \| 0.3650 \| 0.4137 \| 0.4010 \| 0.3780 \| 0.3892 \| 0.3494 \| 0.3588 \| 0.2880 \| 0.3561 \| 0.2841 \| 0.3073 \| 0.3267 \| 0.3349 \| 0.3408 \| 0.3613 \|
	\| Gemma-2B \|0.4666 \| 0.4499 \| 0.7165 \| 0.4756 \| 0.5414 \| 0.5180 \| 0.4841 \| 0.5081 \| 0.4664 \| 0.4655 \| 0.3868 \| 0.4383 \| 0.3413 \| 0.3710 \| 0.4316 \| 0.4291 \| 0.4471 \| 0.4448 \|