Updates to evaluations from Yet Another LLM Leaderboard results
Browse files
README.md
CHANGED
@@ -35,7 +35,7 @@ Evaluations done using mlabonne's usefull [Colab notebook llm-autoeval](https://
|
|
35 |
Also check out the alternative leaderboard at [Yet_Another_LLM_Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard)
|
36 |
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
37 |
|----------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
38 |
-
|[phi-2-orange](https://huggingface.co/rhysjones/phi-2-orange)| **33.
|
39 |
|[phi-2-dpo](https://huggingface.co/lxuechen/phi-2-dpo)| 30.39| **71.68**| **50.75**| 34.9| 46.93|
|
40 |
-
|[dolphin-2_6-phi-2](https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2)| 33.12| 69.85| 47.39|
|
41 |
|[phi-2](https://huggingface.co/microsoft/phi-2)| 27.98| 70.8| 44.43| 35.21| 44.61|
|
|
|
35 |
Also check out the alternative leaderboard at [Yet_Another_LLM_Leaderboard](https://huggingface.co/spaces/mlabonne/Yet_Another_LLM_Leaderboard)
|
36 |
| Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
|
37 |
|----------------------------------------------------------------|------:|------:|---------:|-------:|------:|
|
38 |
+
|[phi-2-orange](https://huggingface.co/rhysjones/phi-2-orange)| **33.37**| 71.33| 49.87| **37.3**| **47.97**|
|
39 |
|[phi-2-dpo](https://huggingface.co/lxuechen/phi-2-dpo)| 30.39| **71.68**| **50.75**| 34.9| 46.93|
|
40 |
+
|[dolphin-2_6-phi-2](https://huggingface.co/cognitivecomputations/dolphin-2_6-phi-2)| 33.12| 69.85| 47.39| 37.2| 46.89|
|
41 |
|[phi-2](https://huggingface.co/microsoft/phi-2)| 27.98| 70.8| 44.43| 35.21| 44.61|
|