laurentiubp commited on
Commit
56be4e6
1 Parent(s): 62b10d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -21
README.md CHANGED
@@ -2,45 +2,105 @@
2
  license: llama3
3
  base_model: catallama/CataLlama-v0.2-Instruct-SFT
4
  tags:
5
- - trl
6
- - dpo
7
- - generated_from_trainer
8
  model-index:
9
  - name: CataLlama-v0.2-Instruct-DPO
10
  results: []
 
 
 
 
 
 
11
  ---
12
 
13
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
- should probably proofread and complete it, then remove this comment. -->
15
 
16
  # CataLlama-v0.2-Instruct-DPO
17
 
18
- GSM8K: 60.05
19
- MMLU: 58.89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  ## Benchmarks
22
 
23
  | Model | CataLlama-v0.1-Instruct-DPO | CataLlama-v0.2-Instruct-DPO |
24
  | ------------------ | --------------------------- | ------------------------------- |
25
  | MMLU 5 shot | 47.34 | **58.89** |
26
- | GSM8K cot 8 shot | 43.29 | **60.05** |
27
 
28
- This model is a fine-tuned version of [catallama/CataLlama-v0.2-Instruct-SFT](https://huggingface.co/catallama/CataLlama-v0.2-Instruct-SFT) on an unknown dataset.
29
 
30
- ## Model description
31
 
32
- More information needed
33
 
34
- ## Intended uses & limitations
35
 
36
- More information needed
 
 
37
 
38
- ## Training and evaluation data
39
 
40
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ## Training procedure
43
 
 
 
 
 
 
 
 
44
  ### Training hyperparameters
45
 
46
  The following hyperparameters were used during training:
@@ -57,13 +117,12 @@ The following hyperparameters were used during training:
57
  - lr_scheduler_warmup_steps: 200
58
  - num_epochs: 2
59
 
60
- ### Training results
61
 
 
62
 
 
63
 
64
- ### Framework versions
65
 
66
- - Transformers 4.38.1
67
- - Pytorch 2.1.1+cu121
68
- - Datasets 2.16.1
69
- - Tokenizers 0.15.2
 
2
  license: llama3
3
  base_model: catallama/CataLlama-v0.2-Instruct-SFT
4
  tags:
5
+ - llama
6
+ - llama-3
7
+ - catalan
8
  model-index:
9
  - name: CataLlama-v0.2-Instruct-DPO
10
  results: []
11
+ datasets:
12
+ - catallama/Catalan-DPO-V2
13
+ language:
14
+ - ca
15
+ - en
16
+ pipeline_tag: text-generation
17
  ---
18
 
19
+ ![](https://huggingface.co/catallama/CataLlama-v0.2-Instruct-SFT/resolve/main/CataLlama-v0.2.png)
 
20
 
21
  # CataLlama-v0.2-Instruct-DPO
22
 
23
+ **CataLlama-v0.2-Instruct-DPO** is a DPO fine-tune of [catallama/CataLlama-v0.2-Instruct-SFT](https://huggingface.co/catallama/CataLlama-v0.2-Instruct-SFT) on the [catallama/Catalan-DPO-V2](https://huggingface.co/datasets/catallama/Catalan-DPO-V2) dataset.
24
+
25
+ CataLlama-v0.2 was trained on roughly **620 million new tokens** which is almost 40% more than CataLlama-v0.1.
26
+
27
+ The DPO-V2 dataset has been completely rebuilt and it's almost twice the size of the DPO-V1 dataeset.
28
+
29
+ The model shows improved proficiency with the Catalan language.
30
+
31
+ **This is an instruction fine-tuned model, optimised with DPO, proficient on the following tasks in Catalan**
32
+
33
+ - *Information extraction (suitable for RAG)*
34
+ - *Named Entity Recognition (NER)*
35
+ - *Translation from English to Catalan and Catalan to English*
36
+ - *Summarization - both short form and long form*
37
+ - *Sentiment analysis*
38
+ - *Chat*
39
+
40
+ **Model developers** [Laurentiu Petrea](https://www.linkedin.com/in/laurentiupetrea/) based on Llama-3 from Meta.
41
+
42
+ **Model Architecture** CataLlama is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and direct preference optimisation (DPO) to align with human preferences for helpfulness and safety.
43
+
44
+ **License** The model uses the llama-3 license available at: [https://llama.meta.com/llama3/license](https://llama.meta.com/llama3/license)
45
+
46
 
47
  ## Benchmarks
48
 
49
  | Model | CataLlama-v0.1-Instruct-DPO | CataLlama-v0.2-Instruct-DPO |
50
  | ------------------ | --------------------------- | ------------------------------- |
51
  | MMLU 5 shot | 47.34 | **58.89** |
52
+ | GSM8K CoT 8 shot | 43.29 | **60.05** |
53
 
 
54
 
55
+ ### Use with transformers
56
 
57
+ See the snippet below for usage with Transformers:
58
 
59
+ **The model follows the same prompt template as Llama-3 Instruct**
60
 
61
+ ```python
62
+ import transformers
63
+ import torch
64
 
65
+ model_id = "catallama/CataLlama-v0.2-Instruct-DPO"
66
 
67
+ pipeline = transformers.pipeline(
68
+ "text-generation",
69
+ model=model_id,
70
+ model_kwargs={"torch_dtype": torch.bfloat16},
71
+ device_map="auto",
72
+ )
73
+
74
+ messages = [
75
+ {"role": "user", "content": "Ei com estàs avui?"},
76
+ ]
77
+
78
+ prompt = pipeline.tokenizer.apply_chat_template(
79
+ messages,
80
+ tokenize=False,
81
+ add_generation_prompt=True
82
+ )
83
+
84
+ outputs = pipeline(
85
+ prompt,
86
+ max_new_tokens=1024,
87
+ do_sample=True,
88
+ temperature=0.6,
89
+ top_p=0.9,
90
+ )
91
+
92
+ print(outputs[0]["generated_text"][len(prompt):])
93
+ ```
94
 
95
  ## Training procedure
96
 
97
+ The model was trained **with the same prompt template of Llama-3 Instruct**.
98
+
99
+ The model was trained for two epochs on **8x A100 80GB GPUs using DeepSpeed ZeRO** State-3 without CPU offloading.
100
+
101
+ Then training lasted approximately 3 hours for a total GPU cost of 45€.
102
+
103
+
104
  ### Training hyperparameters
105
 
106
  The following hyperparameters were used during training:
 
117
  - lr_scheduler_warmup_steps: 200
118
  - num_epochs: 2
119
 
120
+ ## Intended Use
121
 
122
+ **Note:** This model is not intended to beat benchmarks, but to demonstrate techniques for augmenting LLMs on new languages and preserve rare languages as part of our world heritage.
123
 
124
+ **Intended Use Cases** Llama 3 is intended for commercial and research use in English. Instruction tuned models are intended for assistant-like chat, whereas pretrained models can be adapted for a variety of natural language generation tasks.
125
 
126
+ **Out-of-scope** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by the Acceptable Use Policy and Llama 3 Community License. Use in languages other than English**.
127
 
128
+ **Note: Developers may fine-tune Llama 3 models for languages beyond English provided they comply with the Llama 3 Community License and the Acceptable Use Policy.