grimjim commited on
Commit
0675fc8
1 Parent(s): d57fa20

Update README.md

Browse files

Added benchmark evaluation results

Files changed (1) hide show
  1. README.md +106 -1
README.md CHANGED
@@ -30,10 +30,115 @@ inference:
30
  stop:
31
  - <|end_of_text|>
32
  - <|eot_id|>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ---
34
  # llama-3-experiment-v1-9B
35
 
36
- This is an experimental merge, replicating additional layers to the model without post-merge healing. There is damage to the model, but it appears to be tolerable as is. The resulting impact on narrative text completion may be of interest.
 
 
37
 
38
  Light testing performed with instruct prompting and the following sampler settings:
39
  - temp=1 and minP=0.02
 
30
  stop:
31
  - <|end_of_text|>
32
  - <|eot_id|>
33
+ model-index:
34
+ - name: grimjim/grimjim/llama-3-experiment-v1-9B
35
+ results:
36
+ - task:
37
+ type: text-generation
38
+ name: Text Generation
39
+ dataset:
40
+ name: AI2 Reasoning Challenge (25-Shot)
41
+ type: ai2_arc
42
+ config: ARC-Challenge
43
+ split: test
44
+ args:
45
+ num_few_shot: 25
46
+ metrics:
47
+ - type: acc_norm
48
+ value: 66.41
49
+ name: normalized accuracy
50
+ source:
51
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=grimjim/grimjim/llama-3-experiment-v1-9B
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: HellaSwag (10-Shot)
58
+ type: hellaswag
59
+ split: validation
60
+ args:
61
+ num_few_shot: 10
62
+ metrics:
63
+ - type: acc_norm
64
+ value: 78.56
65
+ name: normalized accuracy
66
+ source:
67
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=grimjim/llama-3-experiment-v1-9B
68
+ name: Open LLM Leaderboard
69
+ - task:
70
+ type: text-generation
71
+ name: Text Generation
72
+ dataset:
73
+ name: MMLU (5-Shot)
74
+ type: cais/mmlu
75
+ config: all
76
+ split: test
77
+ args:
78
+ num_few_shot: 5
79
+ metrics:
80
+ - type: acc
81
+ value: 66.71
82
+ name: accuracy
83
+ source:
84
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=grimjim/llama-3-experiment-v1-9B
85
+ name: Open LLM Leaderboard
86
+ - task:
87
+ type: text-generation
88
+ name: Text Generation
89
+ dataset:
90
+ name: TruthfulQA (0-shot)
91
+ type: truthful_qa
92
+ config: multiple_choice
93
+ split: validation
94
+ args:
95
+ num_few_shot: 0
96
+ metrics:
97
+ - type: mc2
98
+ value: 50.7
99
+ source:
100
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=grimjim/llama-3-experiment-v1-9B
101
+ name: Open LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: Winogrande (5-shot)
107
+ type: winogrande
108
+ config: winogrande_xl
109
+ split: validation
110
+ args:
111
+ num_few_shot: 5
112
+ metrics:
113
+ - type: acc
114
+ value: 75.93
115
+ name: accuracy
116
+ source:
117
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=grimjim/llama-3-experiment-v1-9B
118
+ name: Open LLM Leaderboard
119
+ - task:
120
+ type: text-generation
121
+ name: Text Generation
122
+ dataset:
123
+ name: GSM8k (5-shot)
124
+ type: gsm8k
125
+ config: main
126
+ split: test
127
+ args:
128
+ num_few_shot: 5
129
+ metrics:
130
+ - type: acc
131
+ value: 65.88
132
+ name: accuracy
133
+ source:
134
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=grimjim/llama-3-experiment-v1-9B
135
+ name: Open LLM Leaderboard
136
  ---
137
  # llama-3-experiment-v1-9B
138
 
139
+ This is an experimental merge, replicating additional layers to the model without post-merge healing.
140
+ There is damage to the model, but it appears to be tolerable as is; the performance difference in benchmarks from the original 8B Instruct model does not appear to be significant.
141
+ The resulting impact on narrative text completion may also be of interest.
142
 
143
  Light testing performed with instruct prompting and the following sampler settings:
144
  - temp=1 and minP=0.02