chrisociepa commited on
Commit
4dee57c
1 Parent(s): 6b53686

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -5
README.md CHANGED
@@ -118,8 +118,7 @@ Models have been evaluated on [Open PL LLM Leaderboard](https://huggingface.co/s
118
  - Reader (Generator) - open book question answering task, commonly used in RAG
119
  - Perplexity (lower is better) - as a bonus, does not correlate with other scores and should not be used for model comparison
120
 
121
- Current scores of pretrained and continuously pretrained models according to Open PL LLM Leaderboard 5-shot
122
-
123
 
124
  | | Average | RAG Reranking | RAG Reader | Perplexity |
125
  |--------------------------------------------------------------------------------------|----------:|--------------:|-----------:|-----------:|
@@ -132,7 +131,7 @@ Current scores of pretrained and continuously pretrained models according to Ope
132
  | mistralai/Mistral-7B-v0.1 | 30.67 | 60.35 | 85.39 | 857.32 |
133
  | internlm/internlm2-7b | 33.03 | 69.39 | 73.63 | 5498.23 |
134
  | alpindale/Mistral-7B-v0.2-hf | 33.05 | 60.23 | 85.21 | 932.60 |
135
- | speakleash/mistral-apt3-7B/spi-e0_hf | **35.50** | **62.14** | 87.48 | 132.78 |
136
  | | | | | |
137
  | **Models with different sizes:** | | | | |
138
  | sdadas/polish-gpt2-xl (1.7B) | -23.22 | 48.07 | 3.04 | 160.95 |
@@ -148,7 +147,10 @@ Current scores of pretrained and continuously pretrained models according to Ope
148
  | [Bielik-7B-Instruct-v0.1](https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1) | 39.28 | 61.89 | 86.00 | 277.92 |
149
 
150
 
151
- As you can see, Bielik-7B-v0.1 does not have the best Average score, but it has some clear advantages, e.g. the best score in the RAG Reader task.
 
 
 
152
 
153
 
154
  ## Limitations and Biases
@@ -201,7 +203,7 @@ The model could not have been created without the commitment and work of the ent
201
  [Piotr Rybak](https://www.linkedin.com/in/piotrrybak/)
202
  and many other wonderful researchers and enthusiasts of the AI world.
203
 
204
- Members of the ACK Cyfronet AGH team:
205
  [Szymon Mazurek](https://www.linkedin.com/in/sz-mazurek-ai/).
206
 
207
 
 
118
  - Reader (Generator) - open book question answering task, commonly used in RAG
119
  - Perplexity (lower is better) - as a bonus, does not correlate with other scores and should not be used for model comparison
120
 
121
+ As of April 3, 2024, the following table showcases the current scores of pretrained and continuously pretrained models according to the Open PL LLM Leaderboard, evaluated in a 5-shot setting:
 
122
 
123
  | | Average | RAG Reranking | RAG Reader | Perplexity |
124
  |--------------------------------------------------------------------------------------|----------:|--------------:|-----------:|-----------:|
 
131
  | mistralai/Mistral-7B-v0.1 | 30.67 | 60.35 | 85.39 | 857.32 |
132
  | internlm/internlm2-7b | 33.03 | 69.39 | 73.63 | 5498.23 |
133
  | alpindale/Mistral-7B-v0.2-hf | 33.05 | 60.23 | 85.21 | 932.60 |
134
+ | speakleash/mistral-apt3-7B/spi-e0_hf (experimental) | **35.50** | **62.14** | 87.48 | 132.78 |
135
  | | | | | |
136
  | **Models with different sizes:** | | | | |
137
  | sdadas/polish-gpt2-xl (1.7B) | -23.22 | 48.07 | 3.04 | 160.95 |
 
147
  | [Bielik-7B-Instruct-v0.1](https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1) | 39.28 | 61.89 | 86.00 | 277.92 |
148
 
149
 
150
+ As you can see, Bielik-7B-v0.1 does not have the best Average score, but it has some clear advantages, e.g. the best score in the RAG Reader task.
151
+
152
+ The results in the above table were obtained without utilizing instruction templates for instructional models, instead treating them like base models.
153
+ This approach could skew the results, as instructional models are optimized with specific instructions in mind.
154
 
155
 
156
  ## Limitations and Biases
 
203
  [Piotr Rybak](https://www.linkedin.com/in/piotrrybak/)
204
  and many other wonderful researchers and enthusiasts of the AI world.
205
 
206
+ Members of the ACK Cyfronet AGH team providing valuable support and expertise:
207
  [Szymon Mazurek](https://www.linkedin.com/in/sz-mazurek-ai/).
208
 
209