leaderboard-pr-bot commited on
Commit
90b3073
1 Parent(s): 54f816e

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co./spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co./spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +118 -4
README.md CHANGED
@@ -1,18 +1,18 @@
1
  ---
2
  license: mit
3
- base_model: facebook/bart-large-cnn
4
  tags:
5
  - generated_from_trainer
6
  datasets:
7
  - xsum
8
  metrics:
9
  - rouge
 
10
  model-index:
11
  - name: theus_concepttagger
12
  results:
13
  - task:
14
- name: Sequence-to-sequence Language Modeling
15
  type: text2text-generation
 
16
  dataset:
17
  name: xsum
18
  type: xsum
@@ -20,9 +20,109 @@ model-index:
20
  split: validation
21
  args: default
22
  metrics:
23
- - name: Rouge1
24
- type: rouge
25
  value: 34.8663
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ---
27
 
28
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -77,3 +177,17 @@ The following hyperparameters were used during training:
77
  - Pytorch 2.0.1+cu118
78
  - Datasets 2.14.5
79
  - Tokenizers 0.13.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
 
3
  tags:
4
  - generated_from_trainer
5
  datasets:
6
  - xsum
7
  metrics:
8
  - rouge
9
+ base_model: facebook/bart-large-cnn
10
  model-index:
11
  - name: theus_concepttagger
12
  results:
13
  - task:
 
14
  type: text2text-generation
15
+ name: Sequence-to-sequence Language Modeling
16
  dataset:
17
  name: xsum
18
  type: xsum
 
20
  split: validation
21
  args: default
22
  metrics:
23
+ - type: rouge
 
24
  value: 34.8663
25
+ name: Rouge1
26
+ - task:
27
+ type: text-generation
28
+ name: Text Generation
29
+ dataset:
30
+ name: AI2 Reasoning Challenge (25-Shot)
31
+ type: ai2_arc
32
+ config: ARC-Challenge
33
+ split: test
34
+ args:
35
+ num_few_shot: 25
36
+ metrics:
37
+ - type: acc_norm
38
+ value: 24.57
39
+ name: normalized accuracy
40
+ source:
41
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=namanpundir/theus_concepttagger
42
+ name: Open LLM Leaderboard
43
+ - task:
44
+ type: text-generation
45
+ name: Text Generation
46
+ dataset:
47
+ name: HellaSwag (10-Shot)
48
+ type: hellaswag
49
+ split: validation
50
+ args:
51
+ num_few_shot: 10
52
+ metrics:
53
+ - type: acc_norm
54
+ value: 25.5
55
+ name: normalized accuracy
56
+ source:
57
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=namanpundir/theus_concepttagger
58
+ name: Open LLM Leaderboard
59
+ - task:
60
+ type: text-generation
61
+ name: Text Generation
62
+ dataset:
63
+ name: MMLU (5-Shot)
64
+ type: cais/mmlu
65
+ config: all
66
+ split: test
67
+ args:
68
+ num_few_shot: 5
69
+ metrics:
70
+ - type: acc
71
+ value: 23.12
72
+ name: accuracy
73
+ source:
74
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=namanpundir/theus_concepttagger
75
+ name: Open LLM Leaderboard
76
+ - task:
77
+ type: text-generation
78
+ name: Text Generation
79
+ dataset:
80
+ name: TruthfulQA (0-shot)
81
+ type: truthful_qa
82
+ config: multiple_choice
83
+ split: validation
84
+ args:
85
+ num_few_shot: 0
86
+ metrics:
87
+ - type: mc2
88
+ value: 48.25
89
+ source:
90
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=namanpundir/theus_concepttagger
91
+ name: Open LLM Leaderboard
92
+ - task:
93
+ type: text-generation
94
+ name: Text Generation
95
+ dataset:
96
+ name: Winogrande (5-shot)
97
+ type: winogrande
98
+ config: winogrande_xl
99
+ split: validation
100
+ args:
101
+ num_few_shot: 5
102
+ metrics:
103
+ - type: acc
104
+ value: 48.3
105
+ name: accuracy
106
+ source:
107
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=namanpundir/theus_concepttagger
108
+ name: Open LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: GSM8k (5-shot)
114
+ type: gsm8k
115
+ config: main
116
+ split: test
117
+ args:
118
+ num_few_shot: 5
119
+ metrics:
120
+ - type: acc
121
+ value: 0.0
122
+ name: accuracy
123
+ source:
124
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=namanpundir/theus_concepttagger
125
+ name: Open LLM Leaderboard
126
  ---
127
 
128
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
177
  - Pytorch 2.0.1+cu118
178
  - Datasets 2.14.5
179
  - Tokenizers 0.13.3
180
+
181
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
182
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_namanpundir__theus_concepttagger)
183
+
184
+ | Metric |Value|
185
+ |---------------------------------|----:|
186
+ |Avg. |28.29|
187
+ |AI2 Reasoning Challenge (25-Shot)|24.57|
188
+ |HellaSwag (10-Shot) |25.50|
189
+ |MMLU (5-Shot) |23.12|
190
+ |TruthfulQA (0-shot) |48.25|
191
+ |Winogrande (5-shot) |48.30|
192
+ |GSM8k (5-shot) | 0.00|
193
+