speakleash
/

Bielik-7B-v0.1

@@ -118,8 +118,7 @@ Models have been evaluated on [Open PL LLM Leaderboard](https://huggingface.co/s
 - Reader (Generator) - open book question answering task, commonly used in RAG
 - Perplexity (lower is better) - as a bonus, does not correlate with other scores and should not be used for model comparison
-Current scores of pretrained and continuously pretrained models according to Open PL LLM Leaderboard 5-shot
 |                                                                                      |   Average | RAG Reranking | RAG Reader | Perplexity |
 |--------------------------------------------------------------------------------------|----------:|--------------:|-----------:|-----------:|
@@ -132,7 +131,7 @@ Current scores of pretrained and continuously pretrained models according to Ope
 | mistralai/Mistral-7B-v0.1                                                            |     30.67 |         60.35 |      85.39 |     857.32 |
 | internlm/internlm2-7b                                                                |     33.03 |         69.39 |      73.63 |    5498.23 |
 | alpindale/Mistral-7B-v0.2-hf                                                         |     33.05 |         60.23 |      85.21 |     932.60 |
-| speakleash/mistral-apt3-7B/spi-e0_hf                                                 | **35.50** |     **62.14** |      87.48 |     132.78 |
 |                                                                                      |           |               |            |            |
 | **Models with different sizes:**                                                     |           |               |            |            |
 | sdadas/polish-gpt2-xl (1.7B)                                                         |    -23.22 |         48.07 |       3.04 |     160.95 |
@@ -148,7 +147,10 @@ Current scores of pretrained and continuously pretrained models according to Ope
 | [Bielik-7B-Instruct-v0.1](https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1) |     39.28 |         61.89 |      86.00 |     277.92 |
-As you can see, Bielik-7B-v0.1 does not have the best Average score, but it has some clear advantages, e.g. the best score in the RAG Reader task.
 ## Limitations and Biases
@@ -201,7 +203,7 @@ The model could not have been created without the commitment and work of the ent
 [Piotr Rybak](https://www.linkedin.com/in/piotrrybak/)
 and many other wonderful researchers and enthusiasts of the AI world.
-Members of the ACK Cyfronet AGH team:
 [Szymon Mazurek](https://www.linkedin.com/in/sz-mazurek-ai/).

 - Reader (Generator) - open book question answering task, commonly used in RAG
 - Perplexity (lower is better) - as a bonus, does not correlate with other scores and should not be used for model comparison
+As of April 3, 2024, the following table showcases the current scores of pretrained and continuously pretrained models according to the Open PL LLM Leaderboard, evaluated in a 5-shot setting:
 |                                                                                      |   Average | RAG Reranking | RAG Reader | Perplexity |
 |--------------------------------------------------------------------------------------|----------:|--------------:|-----------:|-----------:|
 | mistralai/Mistral-7B-v0.1                                                            |     30.67 |         60.35 |      85.39 |     857.32 |
 | internlm/internlm2-7b                                                                |     33.03 |         69.39 |      73.63 |    5498.23 |
 | alpindale/Mistral-7B-v0.2-hf                                                         |     33.05 |         60.23 |      85.21 |     932.60 |
+| speakleash/mistral-apt3-7B/spi-e0_hf (experimental)                                  | **35.50** |     **62.14** |      87.48 |     132.78 |
 |                                                                                      |           |               |            |            |
 | **Models with different sizes:**                                                     |           |               |            |            |
 | sdadas/polish-gpt2-xl (1.7B)                                                         |    -23.22 |         48.07 |       3.04 |     160.95 |
 | [Bielik-7B-Instruct-v0.1](https://huggingface.co/speakleash/Bielik-7B-Instruct-v0.1) |     39.28 |         61.89 |      86.00 |     277.92 |
+As you can see, Bielik-7B-v0.1 does not have the best Average score, but it has some clear advantages, e.g. the best score in the RAG Reader task.
+The results in the above table were obtained without utilizing instruction templates for instructional models, instead treating them like base models.
+This approach could skew the results, as instructional models are optimized with specific instructions in mind.
 ## Limitations and Biases
 [Piotr Rybak](https://www.linkedin.com/in/piotrrybak/)
 and many other wonderful researchers and enthusiasts of the AI world.
+Members of the ACK Cyfronet AGH team providing valuable support and expertise:
 [Szymon Mazurek](https://www.linkedin.com/in/sz-mazurek-ai/).