leaderboard-pr-bot
commited on
Adding Evaluation Results
Browse filesThis is an automated PR created with https://huggingface.co./spaces/Weyaxi/open-llm-leaderboard-results-pr
The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.
If you encounter any issues, please report them to https://huggingface.co./spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions
README.md
CHANGED
@@ -1,24 +1,146 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
-
datasets:
|
4 |
-
- WizardLM/WizardLM_evol_instruct_V2_196k
|
5 |
-
- icybee/share_gpt_90k_v1
|
6 |
language:
|
7 |
- en
|
8 |
-
|
9 |
-
- example_title: "example 1"
|
10 |
-
text: "How do I mount a tv to drywall safely?"
|
11 |
-
output:
|
12 |
-
text: "Mounting a TV to drywall safely is crucial to ensure the longevity of your TV and the wall. Here's a step-by-step guide on how to do it safely:\n 1. Choose the right mount: Choose a mount that is compatible with your TV and the wall. Make sure the mount is rated for drywall and not just wood or concrete.\n 2. Prepare the wall: Before mounting the TV, make sure the wall is clean and free from any debris or paint. Use a stud finder to locate the studs in the wall.\n 3. Install the mount: Follow the manufacturer's instructions for mounting the TV to the wall. Use appropriate hardware and tools to secure the mount to the wall.\n 4. Level the TV: Level the TV on the mount using a spirit level or a leveling kit provided by the mount manufacturer.\n 5. Attach the TV to the mount: Attach the TV to the mount using the appropriate hardware and tools. Tighten the bolts and screws to ensure the TV is securely attached.\n 6. Connect the cables: Connect the TV cables to the appropriate ports on the back of the TV and the mount.\n 7. Test the mount: Test the mount to ensure it's secure and stable. Adjust the mount as needed to ensure the TV is level and secure.\n Mounting a TV to drywall safely is crucial to avoid damaging the wall or the TV. Follow these steps carefully and use appropriate tools and hardware to ensure a secure and stable installation."
|
13 |
-
- example_title: "example 2"
|
14 |
-
text: "Happy is to sad as calm is to _."
|
15 |
-
output:
|
16 |
-
text: "The adjective that can be used to describe the opposite of calm is \"anxious\" or \"stressed.\" So, from happy to sad, we can say that happy is to sad as calm is to anxious or stressed."
|
17 |
library_name: transformers
|
18 |
-
pipeline_tag: text-generation
|
19 |
tags:
|
20 |
- nlp
|
21 |
- llm
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
---
|
23 |
# AmberChat
|
24 |
|
@@ -148,4 +270,17 @@ You need to build the model once and can just run it afterwards.
|
|
148 |
archivePrefix={arXiv},
|
149 |
primaryClass={cs.CL}
|
150 |
}
|
151 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
|
|
|
|
|
|
|
|
2 |
language:
|
3 |
- en
|
4 |
+
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
library_name: transformers
|
|
|
6 |
tags:
|
7 |
- nlp
|
8 |
- llm
|
9 |
+
datasets:
|
10 |
+
- WizardLM/WizardLM_evol_instruct_V2_196k
|
11 |
+
- icybee/share_gpt_90k_v1
|
12 |
+
widget:
|
13 |
+
- example_title: example 1
|
14 |
+
text: How do I mount a tv to drywall safely?
|
15 |
+
output:
|
16 |
+
text: "Mounting a TV to drywall safely is crucial to ensure the longevity of your\
|
17 |
+
\ TV and the wall. Here's a step-by-step guide on how to do it safely:\n 1.\
|
18 |
+
\ Choose the right mount: Choose a mount that is compatible with your TV and\
|
19 |
+
\ the wall. Make sure the mount is rated for drywall and not just wood or concrete.\n\
|
20 |
+
\ 2. Prepare the wall: Before mounting the TV, make sure the wall is clean and\
|
21 |
+
\ free from any debris or paint. Use a stud finder to locate the studs in the\
|
22 |
+
\ wall.\n 3. Install the mount: Follow the manufacturer's instructions for mounting\
|
23 |
+
\ the TV to the wall. Use appropriate hardware and tools to secure the mount\
|
24 |
+
\ to the wall.\n 4. Level the TV: Level the TV on the mount using a spirit level\
|
25 |
+
\ or a leveling kit provided by the mount manufacturer.\n 5. Attach the TV to\
|
26 |
+
\ the mount: Attach the TV to the mount using the appropriate hardware and tools.\
|
27 |
+
\ Tighten the bolts and screws to ensure the TV is securely attached.\n 6. Connect\
|
28 |
+
\ the cables: Connect the TV cables to the appropriate ports on the back of\
|
29 |
+
\ the TV and the mount.\n 7. Test the mount: Test the mount to ensure it's secure\
|
30 |
+
\ and stable. Adjust the mount as needed to ensure the TV is level and secure.\n\
|
31 |
+
\ Mounting a TV to drywall safely is crucial to avoid damaging the wall or the\
|
32 |
+
\ TV. Follow these steps carefully and use appropriate tools and hardware to\
|
33 |
+
\ ensure a secure and stable installation."
|
34 |
+
- example_title: example 2
|
35 |
+
text: Happy is to sad as calm is to _.
|
36 |
+
output:
|
37 |
+
text: The adjective that can be used to describe the opposite of calm is "anxious"
|
38 |
+
or "stressed." So, from happy to sad, we can say that happy is to sad as calm
|
39 |
+
is to anxious or stressed.
|
40 |
+
pipeline_tag: text-generation
|
41 |
+
model-index:
|
42 |
+
- name: AmberChat
|
43 |
+
results:
|
44 |
+
- task:
|
45 |
+
type: text-generation
|
46 |
+
name: Text Generation
|
47 |
+
dataset:
|
48 |
+
name: AI2 Reasoning Challenge (25-Shot)
|
49 |
+
type: ai2_arc
|
50 |
+
config: ARC-Challenge
|
51 |
+
split: test
|
52 |
+
args:
|
53 |
+
num_few_shot: 25
|
54 |
+
metrics:
|
55 |
+
- type: acc_norm
|
56 |
+
value: 42.92
|
57 |
+
name: normalized accuracy
|
58 |
+
source:
|
59 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LLM360/AmberChat
|
60 |
+
name: Open LLM Leaderboard
|
61 |
+
- task:
|
62 |
+
type: text-generation
|
63 |
+
name: Text Generation
|
64 |
+
dataset:
|
65 |
+
name: HellaSwag (10-Shot)
|
66 |
+
type: hellaswag
|
67 |
+
split: validation
|
68 |
+
args:
|
69 |
+
num_few_shot: 10
|
70 |
+
metrics:
|
71 |
+
- type: acc_norm
|
72 |
+
value: 74.01
|
73 |
+
name: normalized accuracy
|
74 |
+
source:
|
75 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LLM360/AmberChat
|
76 |
+
name: Open LLM Leaderboard
|
77 |
+
- task:
|
78 |
+
type: text-generation
|
79 |
+
name: Text Generation
|
80 |
+
dataset:
|
81 |
+
name: MMLU (5-Shot)
|
82 |
+
type: cais/mmlu
|
83 |
+
config: all
|
84 |
+
split: test
|
85 |
+
args:
|
86 |
+
num_few_shot: 5
|
87 |
+
metrics:
|
88 |
+
- type: acc
|
89 |
+
value: 38.75
|
90 |
+
name: accuracy
|
91 |
+
source:
|
92 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LLM360/AmberChat
|
93 |
+
name: Open LLM Leaderboard
|
94 |
+
- task:
|
95 |
+
type: text-generation
|
96 |
+
name: Text Generation
|
97 |
+
dataset:
|
98 |
+
name: TruthfulQA (0-shot)
|
99 |
+
type: truthful_qa
|
100 |
+
config: multiple_choice
|
101 |
+
split: validation
|
102 |
+
args:
|
103 |
+
num_few_shot: 0
|
104 |
+
metrics:
|
105 |
+
- type: mc2
|
106 |
+
value: 41.18
|
107 |
+
source:
|
108 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LLM360/AmberChat
|
109 |
+
name: Open LLM Leaderboard
|
110 |
+
- task:
|
111 |
+
type: text-generation
|
112 |
+
name: Text Generation
|
113 |
+
dataset:
|
114 |
+
name: Winogrande (5-shot)
|
115 |
+
type: winogrande
|
116 |
+
config: winogrande_xl
|
117 |
+
split: validation
|
118 |
+
args:
|
119 |
+
num_few_shot: 5
|
120 |
+
metrics:
|
121 |
+
- type: acc
|
122 |
+
value: 66.61
|
123 |
+
name: accuracy
|
124 |
+
source:
|
125 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LLM360/AmberChat
|
126 |
+
name: Open LLM Leaderboard
|
127 |
+
- task:
|
128 |
+
type: text-generation
|
129 |
+
name: Text Generation
|
130 |
+
dataset:
|
131 |
+
name: GSM8k (5-shot)
|
132 |
+
type: gsm8k
|
133 |
+
config: main
|
134 |
+
split: test
|
135 |
+
args:
|
136 |
+
num_few_shot: 5
|
137 |
+
metrics:
|
138 |
+
- type: acc
|
139 |
+
value: 5.53
|
140 |
+
name: accuracy
|
141 |
+
source:
|
142 |
+
url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=LLM360/AmberChat
|
143 |
+
name: Open LLM Leaderboard
|
144 |
---
|
145 |
# AmberChat
|
146 |
|
|
|
270 |
archivePrefix={arXiv},
|
271 |
primaryClass={cs.CL}
|
272 |
}
|
273 |
+
```
|
274 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
275 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_LLM360__AmberChat)
|
276 |
+
|
277 |
+
| Metric |Value|
|
278 |
+
|---------------------------------|----:|
|
279 |
+
|Avg. |44.84|
|
280 |
+
|AI2 Reasoning Challenge (25-Shot)|42.92|
|
281 |
+
|HellaSwag (10-Shot) |74.01|
|
282 |
+
|MMLU (5-Shot) |38.75|
|
283 |
+
|TruthfulQA (0-shot) |41.18|
|
284 |
+
|Winogrande (5-shot) |66.61|
|
285 |
+
|GSM8k (5-shot) | 5.53|
|
286 |
+
|