dvres commited on
Commit
d1774f3
1 Parent(s): bef460b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +118 -186
README.md CHANGED
@@ -1,199 +1,131 @@
1
  ---
2
  library_name: transformers
3
- tags: []
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
 
84
  ### Training Procedure
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
 
93
- #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
 
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
 
103
  ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
1
  ---
2
  library_name: transformers
3
+ license: apache-2.0
4
+ language:
5
+ - sl
6
+ pipeline_tag: text-generation
7
  ---
8
 
9
+ # Model Card for GaMS-1B-Chat
10
+ We proudly present the family of GaMS (Generative Model for Slovene) models. The 1B version is based on [Facebook's OPT model](https://huggingface.co/facebook/opt-1.3b) and is adapted for Slovene. GaMS-1B uses a BPE tokenizer with a vocabulary size of 80.000. The tokenizer was trained on Slovene, English, and Croatian data. This is the instruction-tuned version of the model.
11
+
12
+ ## Acknowledgment
13
+
14
+ The model was developed within the [PoVeJMo](https://www.cjvt.si/povejmo/en/project/) research program (Adaptive Natural Language Processing with Large Language Models), particularly within the research project titled SloLLaMai -- Open-access computationally efficient models for Slovenian. The program is funded within the Recovery and Resilience Plan by the Slovenian Research and Innovation Agency (ARIS) and NextGenerationEU. The authors also acknowledge the financial support from the Slovenian Research and Innovation Agency (research core funding No. P6-0411 -- Language Resources and Technologies for Slovene).
15
+
16
+ We thank everyone who worked on data collection and preparation, enabling us to train our model. Special thanks go to Nikola Ljubešić, Tjaša Arčon, Jaka Čibej, Simon Krek, Tomaž Erjavec and Iztok Kosem.
17
+
18
+ ## Basic information
19
+
20
+ - **Developed by:** team of researchers at the University of Ljubljana, Faculty for Computer and Information Science and XLAB.doo. Team members: Domen Vreš, Martin Božič, Aljaž Potočnik, Tomaž Martinčič, Iztok Lebar Bajec, Timotej Petrič and Marko Robnik-Šikonja.
21
+ - **Language:** Slovene
22
+ - **License:** Apache 2.0
23
+ - **Repository:** https://github.com/SloLama/NeMo
24
+ - **Paper:** https://www.sdjt.si/wp/wp-content/uploads/2024/09/JT-DH-2024_Vres_Bozic_Potocnik_Martincic_Robnik.pdf
25
+
26
+ ## Intended usage
27
+ This version of the model is quite small and lacks safety tuning. Hence, using it as a general-purpose model is **STRONGLY DISCOURAGED!** The model might also contain certain biases. We do not recommend the usage of this model in any other language than Slovene.
28
+
29
+ The model can be efficiently tuned for specific use cases as suggested by promising results of fine-tuned models on SuperGLUE and SI-NLI benchmarks
30
+
31
+ ## How to get started with the model
32
+ The inference can be done using the following snippet of code:
33
+ ```python
34
+ import transformers
35
+
36
+ model_id = ("cjvt/GaMS-1B-Chat")
37
+ pipeline = transformers.pipeline(
38
+ "text-generation",
39
+ model=model_id,
40
+ device_map="auto"
41
+ )
42
+
43
+ # Example of response generation
44
+ message = [{"role": "user", "content": "Kateri je najpomembnejši dogodek v slovenski zgodovini?"}]
45
+ response = pipeline(message, max_length=100)
46
+ print("Model's response:", response[0]["generated_text"][-1]["content"])
47
+
48
+ # Example of conversation chain
49
+ new_message = response[0]["generated_text"]
50
+ new_message.append({"role": "user", "content": "Lahko bolj podrobno opišeš ta dogodek?"})
51
+ response = pipeline(new_message, max_length=500)
52
+ print("Model's response:", response[0]["generated_text"][-1]["content"])
53
+ ```
54
+ ## Training details
55
+
56
+ ### Training data
57
+ The model was additionally pretrained on the following Slovene, English, and Croatian-Bosnian-Serbian (CBS) corpora:
58
+ | Corpus | Language | # Tokens | Percentage |
59
+ | :----- | :------- | :------: | :--------: |
60
+ | MetaFida | Slovene | 3.35 B | 11.9 % |
61
+ | KAS | Slovene | 1.66 B | 5.89 % |
62
+ | Trendi | Slovene | 0.68 B | 2.4 % |
63
+ | mC4 | Slovene | 2.88 B | 10.25 % |
64
+ | MaCoCu | Slovene | 2.34 B | 8.3 % |
65
+ | CC100 | Slovene | 0.29 B | 1.02 % |
66
+ | Riznica | Croatian | 0.11 B | 0.39 % |
67
+ | Hr News | Croatian | 2.14 B | 7.59 % |
68
+ | MaCoCu HBS | CBS | 8.63 B | 30.69 % |
69
+ | Wikipedia | English | 5.61 B | 19.93 % |
70
+ | CC-News | English | 0.46 B | 1.64 % |
71
+
72
+ The total size of additional training data is **28.13 B** tokens.
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
  ### Training Procedure
75
 
76
+ The model was trained using the NeMo framework on Slovene HPC Vega, utilizing 64 A100 GPUs simultaneously. The model was trained on 4 epochs. WECHSEL initialization method was used to initialize the embedding matrix of the new vocabulary. All layers apart from the embedding and the output layer were frozen during the first epoch to avoid forgetting. Training took approximately 60 hours. The model was trained with batch size 1024 (2 million tokens) using Adam optimizer and cosine learning rate scheduler with 10.000 warmup and 5.000 constant steps.
 
 
 
 
 
77
 
78
+ ### Supervised Finetuning (SFT)
79
 
80
+ The model was trained on [GaMS-Instruct](http://hdl.handle.net/11356/1971) dataset (20.000 examples). The curated version of the dataset (7.000 examples) is publicly available. 19.050 examples were used as a training set, and 950 examples were used as a validation set.
81
 
82
+ The model was LoRA tuned on 5 epochs with rank 1024. The model was trained with batch size 64 using Adam optimizer and cosine learning rate scheduler with 300 warmup steps.
 
 
 
 
83
 
84
  ## Evaluation
85
 
86
+ The models were evaluated using [Slovene SuperGLUE](https://slobench.cjvt.si/leaderboard/view/3) and [SI-NLI](https://slobench.cjvt.si/leaderboard/view/9) tasks on [SloBench](https://slobench.cjvt.si). Additionally, the models were evaluated on an improved version of the Slovenian-LLM-eval introduced by Aleksa Gordić. All decoder-type models were evaluated using few-shot prompts and were not finetuned on the benchmark (except for the versions with finetuned in the name).
87
+
88
+ ### SuperGLUE results
89
+ | Model | SuperGLUE Average | BoolQ Accuracy | CB Accuracy | CB F1 Score | CB Average | COPA Accuracy | MultiRC EM | MultiRC F1a Score | MultiRC Average | RTE Accuracy | WSC Accuracy |
90
+ | :---- | :---------------: | :------------: | :---------: | :---------: | :--------: | :-----------: | :--------: | :---------------: | :-------------: | :----------: | :----------: |
91
+ | OPT_GaMS-1B | 0.4408 | 0.5667 | 0.5040 | 0.3885 | 0.4463 | 0.5020 | 0.0961 | 0.2543 | 0.1752 | 0.4138 | 0.5411 |
92
+ | GaMS-1B | 0.4604 | 0.5000 | 0.6200 | 0.4565 | 0.5382 | 0.4920 | 0.1351 | 0.2675 | 0.2013 | 0.4828 | 0.5479 |
93
+ | OPT_GaMS-1B-Chat | 0.4165 | 0.7000 | 0.3720 | 0.2961 | 0.3341 | 0.4600 | 0.1111 | 0.3448 | 0.2280 | 0.4138 | 0.3630 |
94
+ | GaMS-1B-Chat | 0.4570 | **0.8000** | 0.4880 | 0.3023 | 0.3951 | 0.4840 | 0.1081 | 0.2428 | 0.1755 | 0.5172 | 0.3699 |
95
+ | OPT_GaMS-1B-Chat finetuned | 0.5645 | 0.7000 | 0.8040 | 0.5884 | 0.6962 | 0.5860 | 0.1021 | 0.4808 | 0.2914 | 0.5862 | 0.5274 |
96
+ | GaMS-1B-Chat finetuned | 0.5806 | 0.7333 | **0.8120** | 0.5592 | 0.6856 | 0.5080 | 0.1381 | 0.4882 | 0.3132 | 0.5862 | **0.6575** |
97
+ | SlovenianGPT-Chat* | 0.5078 | 0.7333 | 0.3920 | 0.3829 | 0.3874 | **0.6840** | **0.2432** | 0.4944 | **0.3688** | 0.5172 | 0.3562 |
98
+ | CroSloEngual BERT | **0.6078** | 0.7333 | 0.7920 | **0.7437** | **0.7679** | 0.5720 | 0.0931 | **0.5241** | 0.3086 | **0.6552** | 0.6096 |
99
+
100
+ *SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.
101
+
102
+ ### SI-NLI results
103
+ | Model | Accuracy | P(entailment) | R(entailment) | F1(entailment) | P(neutral) | R(neutral) | F1(neutral) | P(contradiction) | R(contradiction) | F1(contradiction) |
104
+ | :---- | :------: | :-----------: | :-----------: | :------------: | :--------: | :---------: | :---------: | :---------------: | :---------------: | :----------------: |
105
+ | OPT_GaMS-1B | 0.3277 | 0.3407 | 0.6754 | 0.4529 | 0.3538 | 0.1402 | 0.2009 | 0.2632 | 0.1524 | 0.1931 |
106
+ | GaMS-1B | 0.3317 | 0.3418 | 0.4327 | 0.3819 | 0.3353 | 0.5122 | 0.4053 | 0.2344 | 0.0457 | 0.0765 |
107
+ | OPT_GaMS-1B-Chat | 0.3447 | 0.3515 | 0.6784 | 0.4631 | 0.3386 | 0.3293 | 0.3338 | 0.2105 | 0.0122 | 0.0231 |
108
+ | GaMS-1B-Chat | 0.3417 | 0.3405 | **0.9737** | 0.5045 | 0.2857 | 0.0061 | 0.0119 | 0.4615 | 0.0183 | 0.0352 |
109
+ | OPT_GaMS-1B-Chat finetuned | 0.7244 | 0.7065 | 0.8304 | 0.7634 | 0.7269 | 0.6006 | 0.6578 | 0.7446 | 0.7378 | 0.7412 |
110
+ | GaMS-1B-Chat finetuned | 0.7144 | 0.8037 | 0.6345 | 0.7092 | 0.7247 | 0.6341 | 0.6764 | 0.6531 | **0.8780** | 0.7490 |
111
+ | SlovenianGPT-Chat* | 0.4729 | 0.4399 | 0.7281 | 0.5485 | 0.3719 | 0.1372 | 0.2004 | 0.5723 | 0.5427 | 0.5571 |
112
+ | GPT-3.5-Turbo finetuned | **0.8567** | **0.8464** | 0.8538 | **0.8501** | **0.8041** | **0.8384** | **0.8209** | **0.9260** | **0.8780** | **0.9014** |
113
+ | SloBERTa | 0.7375 | 0.8127 | 0.7105 | 0.7582 | 0.6844 | 0.7470 | 0.7143 | 0.7273 | 0.7561 | 0.7414 |
114
+ | CroSloEngual BERT | 0.6623 | 0.7147 | 0.6667 | 0.6899 | 0.6072 | 0.6646 | 0.6346 | 0.6719 | 0.6555 | 0.6636 |
115
+
116
+ *SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.
117
+
118
+ ### Slovenian-LLM-eval results
119
+ | Model | ARC-Challenge Accuracy | ARC-Easy Accuracy | BoolQ Accuracy | HellaSwag Accuracy | NQ-Open EM | OpenBookQA Accuracy | PIQA Accuracy | WinoGrande Accuracy |
120
+ | :---- | :--------------------: | :---------------: | :------------: | :----------------: | :--------------: | :-----------------: | :-----------: | :-----------------: |
121
+ | OPT_GaMS-1B | 0.2227 &plusmn; 0.0122 | 0.436 &plusmn; 0.0102 | 0.378 &plusmn; 0.0085 | 0.3394 &plusmn; 0.0047 | 0.0003 &plusmn; 0.0003 | 0.214 &plusmn; 0.0184 | 0.6083 &plusmn; 0.0114 | 0.5533 &plusmn; 0.014 |
122
+ | GaMS-1B | 0.2329 &plusmn; 0.0124 | 0.4743 &plusmn; 0.0102 | 0.3813 &plusmn; 0.0085 | 0.3555 &plusmn; 0.0048 | 0.0036 &plusmn; 0.001 | 0.22 &plusmn; 0.0185 | 0.624 &plusmn; 0.0113 | 0.532 &plusmn; 0.014 |
123
+ | OPT_GaMS-1B-Chat | 0.2355 &plusmn; 0.0124 | 0.3960 &plusmn; 0.0100 | 0.4398 &plusmn; 0.0087 | 0.3459 &plusmn; 0.0047 | 0.0011 &plusmn; 0.0006 | 0.20 &plusmn; 0.0179 | 0.5778 &plusmn; 0.0115 | 0.5359 &plusmn; 0.014 |
124
+ | GaMS-1B-Chat | 0.2517 &plusmn; 0.0127 | 0.4394 &plusmn; 0.0102 | 0.4502 &plusmn; 0.0087 | 0.3634 &plusmn; 0.0048 | 0 &plusmn; 0 | 0.196 &plusmn; 0.0178 | 0.6115 &plusmn; 0.0114 | 0.5572 &plusmn; 0.014 |
125
+ | YugoGPT | 0.2961 &plusmn; 0.0133 | 0.4781 &plusmn; 0.0102 | 0.3783 &plusmn; 0.0085 | 0.3890 &plusmn; 0.0047 | 0.0385 &plusmn; 0.0032 | 0.226 &plusmn; 0.0187 | 0.5816 &plusmn; 0.0115 | 0.5588 &plusmn; 0.014 |
126
+ | SlovenianGPT | **0.3805 &plusmn; 0.0142** | **0.6498 &plusmn; 0.0098** | 0.4523 &plusmn; 0.0087 | **0.4935 &plusmn; 0.0050** | **0.0432 &plusmn; 0.0034** | **0.27 &plusmn; 0.0199** | **0.6937 &plusmn; 0.0108** | **0.644 &plusmn; 0.0135** |
127
+ | SlovenianGPT-Chat* | 0.3567 &plusmn; 0.014 | 0.5901 &plusmn; 0.0101 | **0.4706 &plusmn; 0.0087** | 0.4719 &plusmn; 0.0050 | 0.0003 &plusmn; 0.0003 | **0.27 &plusmn; 0.0199** | 0.6861 &plusmn; 0.0108 | 0.6425 &plusmn; 0.0135 |
128
+
129
+ *SlovenianGPT-Chat was obtained by instruction-tuning Aleksa Gordić's [SlovenianGPT](https://huggingface.co/gordicaleksa/SlovenianGPT) on our instruction dataset.
130
+
131
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/652d40a78fa1fbb0aae165bb/_2h977RjIu0nI_IJG_9bL.png)