HiTZ
/

latxa-7b-v1

@@ -12,9 +12,9 @@ metrics:
 pipeline_tag: text-generation
 ---
-# **Model Card for Basque Llama 7B**
-Basque LLaMA is a collection of foundation models specifically tuned for Basque. Based on Meta’s LLaMA 2 model family, these models were further trained with highly curated Basque corpora, Euscrawl ([Artetxe et al., 2022](https://aclanthology.org/2022.emnlp-main.499/)). Ranging from 7 billion to 70 billion parameters, these models are currently the biggest and best-performing LLMs built for Basque. This is the 7B repository, links to other models can be found in the index at the bottom.
 # **Model Details**
@@ -22,7 +22,7 @@ Basque LLaMA is a collection of foundation models specifically tuned for Basque.
 ## **Model Description**
-Basque LLaMA is a family of Large Language Models (LLM) based on Meta’s [LLaMA models](https://huggingface.co/meta-llama). Current LLMs exhibit incredible performance for high-resource languages such as English, but, in the case of Basque and other low-resource languages, their performance is close to a random guesser. These limitations push the gap between high- and low-resource languages when it comes to digital development. We present Basque LLaMA to overcome these limitations and promote the development of LLM-based technology and research for the Basque language. Basque LLaMA models follow the same architecture as their original counterparts and were further trained in Euscrawl v1 ([Artetxe et al., 2022](https://aclanthology.org/2022.emnlp-main.499/)), a high-quality Basque corpora.
 The models are released in three sizes: 7B, 13B and 70B.
@@ -32,8 +32,7 @@ The models are released in three sizes: 7B, 13B and 70B.
 * **Model type:** Language model
 * **Language(s) (NLP):** en, eu
 * **License:** llama2
-* **Parent Model:** meta-llama/Llama-2-7B
-* **Resources for more information:** [PAPER/BLOG/POST link]
 * **Contact:** [email protected]
@@ -42,18 +41,22 @@ The models are released in three sizes: 7B, 13B and 70B.
 Use the code below to get started with the model.
 ```python
 from transformers import pipeline
-pipe = pipeline("text-generation", model="HiTZ/basque-llama-2-7b-v1")
-text = "Donosti da Euskal Herriko lekurik"
-pipe(text, max_new_tokens=40)
 >> [
-  {
-    'generated_text': 'Donosti da Euskal Herriko lekurik garestiena alokairuan bizitzeko,'
-    ' eta Donostiako alokairuaren prezioa %11,3 igo da azken urtean'
-  }
 ]
 ```
@@ -96,14 +99,97 @@ Additionally, 100K documents of English data randomly selected from the [Pile](h
 The models were trained using the GPT-Neox library on the HPC CINECA computing cluster. All the models were approximately trained with an effective batch size of 2M tokens for 1000 to 2000 steps.
-| Model            | Steps | Sequence length | Effective Batch size | Total tokens | GPU hours  |
-| ---------------- | ----- | --------------- | -------------------- | ------------ | ---------- |
-| Basque LLaMA 7B  | 2000  | 4096            | 2M tokens/step       | 4B           | 359.2h     |
-| Basque LLaMA 13B | 1000  | 4096            | 2M tokens/step       | 2B           | 468.8h     |
-| Basque LLaMA 70B | 1680  | 4096            | 2M tokens/step       | 3.4B         | \*6475.52h |
-"*" indicates the time for the entire training process (2000 steps), however the weights of the step 1680 are shared as it is the best checkpoint according to validation loss.
 # **Evaluation**
@@ -120,23 +206,26 @@ We evaluated the models on zero-shot and few-shot settings on generative, multip
 * **Belebele** ([Bandarkar et al.](https://arxiv.org/abs/2308.16884)): Belebele is a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. We evaluated the model in a 5-shot fashion.
     * Data card: [https://huggingface.co/datasets/facebook/belebele](https://huggingface.co/datasets/facebook/belebele)
-* **X-StoryCloze** ([Lin et al.](https://aclanthology.org/2022.emnlp-main.616.pdf)): XStoryCloze consists of the professionally translated version of the English Story Cloze dataset to 10 non-English languages. Story Cloze is a new commonsense reasoning dataset which consists in choosing the correct ending to a four-sentence story. We evaluated the model in a 0-shot fashion.
     * Data card: [https://huggingface.co/datasets/juletxara/xstory_cloze](https://huggingface.co/datasets/juletxara/xstory_cloze)
-* **BasqueGLUE** ([Urbizu et al.](https://aclanthology.org/2022.lrec-1.172.pdf)): BasqueGLUE is a NLU benchmark for Basque. Data card: [https://huggingface.co/datasets/orai-nlp/basqueGLUE](https://huggingface.co/datasets/orai-nlp/basqueGLUE). We evaluated the model in a 5-shot fashion on the following tasks:
-    * **BEC2016eu**: Sentiment analysis on tweets about the 2016 Basque elections campaign.
-    * **VaxxStance**: Stance detection on tweets around the anti-vaccine movement.
-    * **BTHCv2**: Topic classification of news extracts with 12 categories.
-    * **EpecKorrefBin**: Correference detection task similar to WSC.
-    * **QNLIeu**: Q&A NLI built from the Basque Wikipedia.
-    * **WiCeu**: Basque Word-in-Context task.
 ### **Metrics**
-* Accuracy: Belebele, X-StoryCloze, EpecKorrefBin, QNLI-eu, and, WiC-eu
-* Micro F1: BEC2016-eu and BHTCv2
-* Macro F1: VaxxStance (favor & against)
 ## **Results**
@@ -144,17 +233,228 @@ We evaluated the models on zero-shot and few-shot settings on generative, multip
 The model was evaluated using the LM Evaluation harness library from Eleuther AI. In order to reproduce our results please refer to our [fork](https://github.com/naiarapm/lm-evaluation-harness/tree/basqueglue) that includes the implementation for the mentioned datasets.
-| Model            | Belebele | X-StoryCloze | BEC   | Vaxx  | BHTC  | coref | QNLI  | WiC   | Average |
-| ---------------- | -------- | ------------ | ----- | ----- | ----- | ----- | ----- | ----- | ------- |
-| Random           | 25.00    | 50.00        | 33.33 | 33.33 | 8.33  | 50.00 | 50.00 | 50.00 | 37.50   |
-| LLaMA 2 7B       | 26.22    | 50.43        | 41.63 | 18.60 | 20.06 | 50.94 | 48.32 | 49.64 | 38.23   |
-| LLaMA 2 13B      | 32.00    | 50.63        | 41.09 | 18.25 | 27.35 | 49.23 | 48.74 | 49.21 | 39.56   |
-| LLaMA 2 70B      | 33.56    | 51.62        | 47.47 | 21.01 | 31.01 | 52.98 | 51.26 | 51.57 | 42.56   |
-| BLOOM 7B         | 27.00    | 57.18        | 37.94 | 20.72 | 39.10 | 48.21 | 47.48 | 47.57 | 40.65   |
-| XGLM 7B          | 23.88    | 57.71        | 39.94 | 21.58 | 36.73 | 50.94 | 50.42 | 49.21 | 41.30   |
-| Basque LLaMA 7B  | 35.67    | 63.13        | 55.61 | 45.93 | 44.44 | 50.43 | 55.04 | 50.14 | 50.05   |
-| Basque LLaMA 13B | 53.56    | 65.85        | 53.23 | 48.66 | 53.61 | 62.52 | 57.14 | 54.21 | 56.10   |
-| Basque LLaMA 70B | 71.78    | 67.57        | 63.52 | 48.95 | 49.51 | 79.90 | 58.82 | 55.50 | 61.94   |

 pipeline_tag: text-generation
 ---
+# **Model Card for Basque Llama 7b**
+Basque LLaMA is a collection of foundation models specifically tuned for Basque. Based on Meta’s LLaMA 2 model family, these models were further trained with Euscrawl, a highly curated Basque corpora ([Artetxe et al., 2022](https://aclanthology.org/2022.emnlp-main.499/)). Ranging from 7 billion to 70 billion parameters, these models are currently the biggest and best-performing LLMs built for Basque. This is the 7b repository, links to other models can be found in the index at the bottom.
 # **Model Details**
 ## **Model Description**
+Basque LLaMA is a family of Large Language Models (LLM) based on Meta’s [LLaMA models](https://huggingface.co/meta-llama). Current LLMs exhibit incredible performance for high-resource languages such as English, but, in the case of Basque and other low-resource languages, their performance is close to a random guesser. These limitations widen the gap between high- and low-resource languages when it comes to digital development. We present Basque LLaMA to overcome these limitations and promote the development of LLM-based technology and research for the Basque language. Basque LLaMA models follow the same architecture as their original counterparts and were further trained in Euscrawl v1 ([Artetxe et al., 2022](https://aclanthology.org/2022.emnlp-main.499/)), a high-quality Basque corpora.
 The models are released in three sizes: 7B, 13B and 70B.
 * **Model type:** Language model
 * **Language(s) (NLP):** en, eu
 * **License:** llama2
+* **Parent Model:** meta-llama/Llama-2-7b
 * **Contact:** [email protected]
 Use the code below to get started with the model.
 ```python
 from transformers import pipeline
+pipe = pipeline("text-generation", model=”HiTZ/basque-llama-2-7b-v1”)
+text = "Euskara adimen artifizialera iritsi da!"
+pipe(text, max_new_tokens=50, num_beams=5)
 >> [
+ {
+  'generated_text': 'Euskara adimen artifizialera iritsi da!\nEuskararen eta adimen artifizialaren arteko harremana aspaldikoa da,'
+  ' baina azken urteotan aurrerapauso handiak eman dira arlo horretan'
+ }
 ]
 ```
 The models were trained using the GPT-Neox library on the HPC CINECA computing cluster. All the models were approximately trained with an effective batch size of 2M tokens for 1000 to 2000 steps.
+<table>
+  <tr>
+   <td>Model
+   </td>
+   <td>Steps
+   </td>
+   <td>Sequence length
+   </td>
+   <td>Effective Batch size
+   </td>
+   <td>Total tokens
+   </td>
+   <td>GPU hours
+   </td>
+  </tr>
+  <tr>
+   <td>Basque LLaMA 7B
+   </td>
+   <td><p style="text-align: right">
+2000</p>
+   </td>
+   <td><p style="text-align: right">
+4096</p>
+   </td>
+   <td><p style="text-align: right">
+2M tokens/step</p>
+   </td>
+   <td><p style="text-align: right">
+4B</p>
+   </td>
+   <td><p style="text-align: right">
+359.2h</p>
+   </td>
+  </tr>
+  <tr>
+   <td>Basque LLaMA 13B
+   </td>
+   <td><p style="text-align: right">
+1000</p>
+   </td>
+   <td><p style="text-align: right">
+4096</p>
+   </td>
+   <td><p style="text-align: right">
+2M tokens/step</p>
+   </td>
+   <td><p style="text-align: right">
+2B</p>
+   </td>
+   <td><p style="text-align: right">
+468.8h</p>
+   </td>
+  </tr>
+  <tr>
+   <td>Basque LLaMA 70B
+   </td>
+   <td><p style="text-align: right">
+1680</p>
+   </td>
+   <td><p style="text-align: right">
+4096</p>
+   </td>
+   <td><p style="text-align: right">
+2M tokens/step</p>
+   </td>
+   <td><p style="text-align: right">
+3.4B</p>
+   </td>
+   <td><p style="text-align: right">
+*6475.52h</p>
+   </td>
+  </tr>
+</table>
+* indicates the time for the entire training process (2000 steps), however the weights of the step 1680 are shared as it is the best checkpoint according to validation loss.
 # **Evaluation**
 * **Belebele** ([Bandarkar et al.](https://arxiv.org/abs/2308.16884)): Belebele is a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. We evaluated the model in a 5-shot fashion.
     * Data card: [https://huggingface.co/datasets/facebook/belebele](https://huggingface.co/datasets/facebook/belebele)
+* **X-StoryCloze**: XStoryCloze consists of the professionally translated version of the English StoryCloze dataset to 10 non-English languages. Story Cloze is a new commonsense reasoning dataset which consists of choosing the correct ending to a four-sentence story. We evaluated the model in a 0-shot fashion.
     * Data card: [https://huggingface.co/datasets/juletxara/xstory_cloze](https://huggingface.co/datasets/juletxara/xstory_cloze)
+* **BasqueGLUE** ([Urbizu et al.](https://aclanthology.org/2022.lrec-1.172.pdf)): BasqueGLUE is a NLU benchmark for Basque. We evaluated the model in a 5-shot fashion on the following tasks:
+    * Data card:[ https://huggingface.co/datasets/orai-nlp/basqueGLUE](https://huggingface.co/datasets/orai-nlp/basqueGLUE).
+    * Tasks:
+        * **BEC2016eu**: Sentiment analysis on tweets about the 2016 Basque elections campaign.
+        * **VaxxStance**: Stance detection on tweets around the anti-vaccine movement.
+        * **BTHCv2**: Topic classification of news extracts with 12 categories.
+        * **EpecKorrefBin**: Correference detection task similar to WSC.
+        * **QNLIeu**: Q&A NLI built from the Basque Wikipedia.
+        * **WiCeu**: Basque Word-in-Context task.
 ### **Metrics**
+* **Accuracy**: Belebele, X-StoryCloze, EpecKorrefBin, QNLI-eu, and, WiC-eu
+* **Micro F1**: BEC2016-eu and BHTCv2
+* **Macro F1**: VaxxStance (favor & against)
 ## **Results**
 The model was evaluated using the LM Evaluation harness library from Eleuther AI. In order to reproduce our results please refer to our [fork](https://github.com/naiarapm/lm-evaluation-harness/tree/basqueglue) that includes the implementation for the mentioned datasets.
+<table>
+  <tr>
+   <td><strong>Model</strong>
+   </td>
+   <td><strong>Belebele</strong>
+   </td>
+   <td><strong>X-StoryCloze</strong>
+   </td>
+   <td><strong>BEC</strong>
+   </td>
+   <td><strong>Vaxx</strong>
+   </td>
+   <td><strong>BHTC</strong>
+   </td>
+   <td><strong>coref</strong>
+   </td>
+   <td><strong>QNLI</strong>
+   </td>
+   <td><strong>WiC</strong>
+   </td>
+   <td><strong>Average</strong>
+   </td>
+  </tr>
+  <tr>
+   <td>Random
+   </td>
+   <td>25.00
+   </td>
+   <td>50.00
+   </td>
+   <td>33.33
+   </td>
+   <td>33.33
+   </td>
+   <td>8.33
+   </td>
+   <td>50.00
+   </td>
+   <td>50.00
+   </td>
+   <td>50.00
+   </td>
+   <td>37.50
+   </td>
+  </tr>
+  <tr>
+   <td>LLaMA 2 7B
+   </td>
+   <td>26.22
+   </td>
+   <td>50.43
+   </td>
+   <td>41.63
+   </td>
+   <td>18.60
+   </td>
+   <td>20.06
+   </td>
+   <td>50.94
+   </td>
+   <td>48.32
+   </td>
+   <td>49.64
+   </td>
+   <td>38.23
+   </td>
+  </tr>
+  <tr>
+   <td>LLaMA 2 13B
+   </td>
+   <td>32.00
+   </td>
+   <td>50.63
+   </td>
+   <td>41.09
+   </td>
+   <td>18.25
+   </td>
+   <td>27.35
+   </td>
+   <td>49.23
+   </td>
+   <td>48.74
+   </td>
+   <td>49.21
+   </td>
+   <td>39.56
+   </td>
+  </tr>
+  <tr>
+   <td>LLaMA 2 70B
+   </td>
+   <td>33.56
+   </td>
+   <td>51.62
+   </td>
+   <td>47.47
+   </td>
+   <td>21.01
+   </td>
+   <td>31.01
+   </td>
+   <td>52.98
+   </td>
+   <td>51.26
+   </td>
+   <td>51.57
+   </td>
+   <td>42.56
+   </td>
+  </tr>
+  <tr>
+   <td>BLOOM 7B
+   </td>
+   <td>27.00
+   </td>
+   <td>57.18
+   </td>
+   <td>37.94
+   </td>
+   <td>20.72
+   </td>
+   <td>39.10
+   </td>
+   <td>48.21
+   </td>
+   <td>47.48
+   </td>
+   <td>47.57
+   </td>
+   <td>40.65
+   </td>
+  </tr>
+  <tr>
+   <td>XGLM 7B
+   </td>
+   <td>23.88
+   </td>
+   <td>57.71
+   </td>
+   <td>39.94
+   </td>
+   <td>21.58
+   </td>
+   <td>36.73
+   </td>
+   <td>50.94
+   </td>
+   <td>50.42
+   </td>
+   <td>49.21
+   </td>
+   <td>41.30
+   </td>
+  </tr>
+  <tr>
+   <td><strong>Basque LLaMA 7B</strong>
+   </td>
+   <td>35.67
+   </td>
+   <td>63.13
+   </td>
+   <td>55.61
+   </td>
+   <td>45.93
+   </td>
+   <td>44.44
+   </td>
+   <td>50.43
+   </td>
+   <td>55.04
+   </td>
+   <td>50.14
+   </td>
+   <td>50.05
+   </td>
+  </tr>
+  <tr>
+   <td><strong>Basque LLaMA 13B</strong>
+   </td>
+   <td>53.56
+   </td>
+   <td>65.85
+   </td>
+   <td>53.23
+   </td>
+   <td>48.66
+   </td>
+   <td><strong>53.61</strong>
+   </td>
+   <td>62.52
+   </td>
+   <td>57.14
+   </td>
+   <td>54.21
+   </td>
+   <td>56.10
+   </td>
+  </tr>
+  <tr>
+   <td><strong>Basque LLaMA 70B</strong>
+   </td>
+   <td><strong>71.78</strong>
+   </td>
+   <td><strong>67.57</strong>
+   </td>
+   <td><strong>63.52</strong>
+   </td>
+   <td><strong>48.95</strong>
+   </td>
+   <td>49.51
+   </td>
+   <td><strong>79.90</strong>
+   </td>
+   <td><strong>58.82</strong>
+   </td>
+   <td><strong>55.50</strong>
+   </td>
+   <td><strong>61.94</strong>
+   </td>
+  </tr>
+</table>