Adding Evaluation Results

This is an automated PR created with https://huggingface.co./spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co./spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +32 -0

README.md CHANGED Viewed

@@ -12,6 +12,9 @@ model-index:
       args:
         num_few_shot: 0
     metrics:
     - type: inst_level_strict_acc and prompt_level_strict_acc
       value: 44.06
       name: strict accuracy
@@ -27,6 +30,9 @@ model-index:
       args:
         num_few_shot: 3
     metrics:
     - type: acc_norm
       value: 47.73
       name: normalized accuracy
@@ -42,6 +48,9 @@ model-index:
       args:
         num_few_shot: 4
     metrics:
     - type: exact_match
       value: 7.78
       name: exact match
@@ -57,6 +66,9 @@ model-index:
       args:
         num_few_shot: 0
     metrics:
     - type: acc_norm
       value: 10.4
       name: acc_norm
@@ -72,6 +84,9 @@ model-index:
       args:
         num_few_shot: 0
     metrics:
     - type: acc_norm
       value: 8.73
       name: acc_norm
@@ -89,6 +104,9 @@ model-index:
       args:
         num_few_shot: 5
     metrics:
     - type: acc
       value: 36.96
       name: accuracy
@@ -172,3 +190,17 @@ ___________________________________
 I'm based out of Munich, Germany, but I would be interested in working remotely for a team with more compute than my 2x 4090s 🚀
 #### Reach out via [LinkedIn - Dr David Noel Ng](https://www.linkedin.com/in/dnhkng)

       args:
         num_few_shot: 0
     metrics:
+    - type: inst_level_strict_acc and prompt_level_strict_acc
+      value: 44.06
+      name: strict accuracy
     - type: inst_level_strict_acc and prompt_level_strict_acc
       value: 44.06
       name: strict accuracy
       args:
         num_few_shot: 3
     metrics:
+    - type: acc_norm
+      value: 47.73
+      name: normalized accuracy
     - type: acc_norm
       value: 47.73
       name: normalized accuracy
       args:
         num_few_shot: 4
     metrics:
+    - type: exact_match
+      value: 7.78
+      name: exact match
     - type: exact_match
       value: 7.78
       name: exact match
       args:
         num_few_shot: 0
     metrics:
+    - type: acc_norm
+      value: 10.4
+      name: acc_norm
     - type: acc_norm
       value: 10.4
       name: acc_norm
       args:
         num_few_shot: 0
     metrics:
+    - type: acc_norm
+      value: 8.73
+      name: acc_norm
     - type: acc_norm
       value: 8.73
       name: acc_norm
       args:
         num_few_shot: 5
     metrics:
+    - type: acc
+      value: 36.96
+      name: accuracy
     - type: acc
       value: 36.96
       name: accuracy
 I'm based out of Munich, Germany, but I would be interested in working remotely for a team with more compute than my 2x 4090s 🚀
 #### Reach out via [LinkedIn - Dr David Noel Ng](https://www.linkedin.com/in/dnhkng)
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dnhkng__RYS-Medium)
+|      Metric       |Value|
+|-------------------|----:|
+|Avg.               |25.94|
+|IFEval (0-Shot)    |44.06|
+|BBH (3-Shot)       |47.73|
+|MATH Lvl 5 (4-Shot)| 7.78|
+|GPQA (0-shot)      |10.40|
+|MuSR (0-shot)      | 8.73|
+|MMLU-PRO (5-shot)  |36.96|