speakleash
/

Bielik-7B-Instruct-v0.1

@@ -119,7 +119,7 @@ Models have been evaluated on [Open PL LLM Leaderboard](https://huggingface.co/s
 - Reader (Generator) - open book question answering task, commonly used in RAG
 - Perplexity (lower is better) - as a bonus, does not correlate with other scores and should not be used for model comparison
 |                                                                                      |   Average | RAG Reranking | RAG Reader | Perplexity |
 |--------------------------------------------------------------------------------------|----------:|--------------:|-----------:|-----------:|
@@ -137,7 +137,7 @@ Models have been evaluated on [Open PL LLM Leaderboard](https://huggingface.co/s
 | mistralai/Mistral-7B-Instruct-v0.2                                                   |     40.29 |         72.58 |      79.39 |    2088.08 |
 | teknium/OpenHermes-2.5-Mistral-7B                                                    |     42.64 |         70.63 |      80.25 |    1463.00 |
 | openchat/openchat-3.5-1210                                                           |     44.17 |         71.76 |      82.15 |    1923.83 |
-| speakleash/mistral_7B-v2/spkl-all_sft_v2/e1_base/spkl-all_2e6-e1_70c70cc6            |     45.44 |         71.27 |      91.50 |     279.24 |
 | Nexusflow/Starling-LM-7B-beta                                                        |     45.69 |         74.58 |      81.22 |    1161.54 |
 | openchat/openchat-3.5-0106                                                           |     47.32 |         74.71 |      83.60 |    1106.56 |
 | berkeley-nest/Starling-LM-7B-alpha                                                   | **47.46** |     **75.73** |      82.86 |    1438.04 |
@@ -155,13 +155,14 @@ Models have been evaluated on [Open PL LLM Leaderboard](https://huggingface.co/s
 | mistralai/Mistral-7B-v0.1                                                            |     30.67 |         60.35 |      85.39 |     857.32 |
 | internlm/internlm2-7b                                                                |     33.03 |         69.39 |      73.63 |    5498.23 |
 | alpindale/Mistral-7B-v0.2-hf                                                         |     33.05 |         60.23 |      85.21 |     932.60 |
-| speakleash/mistral-apt3-7B/spi-e0_hf                                                 |     35.50 |         62.14 |  **87.48** |     132.78 |
 SpeakLeash models have one of the best scores in the RAG Reader task.
 We have managed to increase Average score by almost 9 pp. in comparison to Mistral-7B-v0.1.
 In our subjective evaluations of chatting skills SpeakLeash models perform better than other models with higher Average scores.
 ## Limitations and Biases
@@ -212,7 +213,7 @@ The model could not have been created without the commitment and work of the ent
 [Remigiusz Kinas](https://www.linkedin.com/in/remigiusz-kinas/),
 and many other wonderful researchers and enthusiasts of the AI world.
-Members of the ACK Cyfronet AGH team:
 [Szymon Mazurek](https://www.linkedin.com/in/sz-mazurek-ai/).
 ## Contact Us

 - Reader (Generator) - open book question answering task, commonly used in RAG
 - Perplexity (lower is better) - as a bonus, does not correlate with other scores and should not be used for model comparison
+As of April 3, 2024, the following table showcases the current scores of pretrained and continuously pretrained models according to the Open PL LLM Leaderboard, evaluated in a 5-shot setting:
 |                                                                                      |   Average | RAG Reranking | RAG Reader | Perplexity |
 |--------------------------------------------------------------------------------------|----------:|--------------:|-----------:|-----------:|
 | mistralai/Mistral-7B-Instruct-v0.2                                                   |     40.29 |         72.58 |      79.39 |    2088.08 |
 | teknium/OpenHermes-2.5-Mistral-7B                                                    |     42.64 |         70.63 |      80.25 |    1463.00 |
 | openchat/openchat-3.5-1210                                                           |     44.17 |         71.76 |      82.15 |    1923.83 |
+| speakleash/mistral_7B-v2/spkl-all_sft_v2/e1_base/spkl-all_2e6-e1_70c70cc6 (experimental) |     45.44 |         71.27 |      91.50 |     279.24 |
 | Nexusflow/Starling-LM-7B-beta                                                        |     45.69 |         74.58 |      81.22 |    1161.54 |
 | openchat/openchat-3.5-0106                                                           |     47.32 |         74.71 |      83.60 |    1106.56 |
 | berkeley-nest/Starling-LM-7B-alpha                                                   | **47.46** |     **75.73** |      82.86 |    1438.04 |
 | mistralai/Mistral-7B-v0.1                                                            |     30.67 |         60.35 |      85.39 |     857.32 |
 | internlm/internlm2-7b                                                                |     33.03 |         69.39 |      73.63 |    5498.23 |
 | alpindale/Mistral-7B-v0.2-hf                                                         |     33.05 |         60.23 |      85.21 |     932.60 |
+| speakleash/mistral-apt3-7B/spi-e0_hf (experimental)                                  |     35.50 |         62.14 |  **87.48** |     132.78 |
 SpeakLeash models have one of the best scores in the RAG Reader task.
 We have managed to increase Average score by almost 9 pp. in comparison to Mistral-7B-v0.1.
 In our subjective evaluations of chatting skills SpeakLeash models perform better than other models with higher Average scores.
+The results in the above table were obtained without utilizing instruction templates for instructional models, instead treating them like base models.
+This approach could skew the results, as instructional models are optimized with specific instructions in mind.
 ## Limitations and Biases
 [Remigiusz Kinas](https://www.linkedin.com/in/remigiusz-kinas/),
 and many other wonderful researchers and enthusiasts of the AI world.
+Members of the ACK Cyfronet AGH team providing valuable support and expertise:
 [Szymon Mazurek](https://www.linkedin.com/in/sz-mazurek-ai/).
 ## Contact Us