stjohn2007 commited on
Commit
c9d35bb
1 Parent(s): 10f5f63

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -5
README.md CHANGED
@@ -52,11 +52,13 @@ This repository provides large language models developed by [TokyoTech-LLM](http
52
 
53
  ### MT-Bench JA
54
 
 
 
 
55
  * NOTE that the models with the `v0.1` suffix are newer versions compared to their original counterparts with the `hf`.
56
  * We report overall (i.e., average over scores of the first and second turns), first, and second turn scores.
57
 
58
-
59
- #### Overall
60
 
61
 
62
  |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
@@ -68,7 +70,7 @@ This repository provides large language models developed by [TokyoTech-LLM](http
68
  | Swallow-70b-instruct-v0.1 |0.4513|0.4822|0.5353|0.3497|0.3492|0.2668|0.5553|0.4955|0.5767|
69
  | Swallow-70b-instruct-hf |0.3259|0.2925|0.4283|0.3447|0.1562|0.1856|0.5634|0.3315|0.3071|
70
 
71
- #### First Turn
72
 
73
  |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
74
  |---|---|---|---|---|---|---|---|---|---|
@@ -79,7 +81,7 @@ This repository provides large language models developed by [TokyoTech-LLM](http
79
  | Swallow-70b-instruct-v0.1 |0.4849|0.5720|0.5020|0.4780|0.3680|0.2467|0.5400|0.5720|0.5960|
80
  | Swallow-70b-instruct-hf |0.3631|0.3420|0.4007|0.4220|0.1580|0.2044|0.6120|0.4280|0.3360|
81
 
82
- #### Second Turn
83
 
84
  |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
85
  |---|---|---|---|---|---|---|---|---|---|
@@ -90,6 +92,36 @@ This repository provides large language models developed by [TokyoTech-LLM](http
90
  | Swallow-70b-instruct-v0.1 |0.4179|0.3913|0.5689|0.2184|0.3280|0.2884|0.5711|0.4171|0.5562|
91
  | Swallow-70b-instruct-hf |0.2872|0.2398|0.4564|0.2647|0.1540|0.1676|0.5118|0.2311|0.2762|
92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
93
 
94
  ## Evaluation Benchmarks
95
 
@@ -121,7 +153,7 @@ This format must be adhered to strictly, as deviations may result in less optima
121
  The template used to construct a prompt for the Instruct model is specified as follows:
122
 
123
  ```
124
- <s>[INST] <<SYS>>\n{Instruction}\n<</SYS>>\n\n{USER_MESSAGE_1} [/INST] {BOT_MESSAGE_1} </s>[INST] {USER_MESSAGE_2}[/INST]
125
  ```
126
 
127
  Please be aware that ``<s>`` and ``</s>`` are special tokens used for the beginning of string (BOS) and end of string (EOS), respectively, while [INST] and [/INST] are considered regular strings.
 
52
 
53
  ### MT-Bench JA
54
 
55
+
56
+ #### Comparison to the past version
57
+
58
  * NOTE that the models with the `v0.1` suffix are newer versions compared to their original counterparts with the `hf`.
59
  * We report overall (i.e., average over scores of the first and second turns), first, and second turn scores.
60
 
61
+ ##### Overall
 
62
 
63
 
64
  |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
 
70
  | Swallow-70b-instruct-v0.1 |0.4513|0.4822|0.5353|0.3497|0.3492|0.2668|0.5553|0.4955|0.5767|
71
  | Swallow-70b-instruct-hf |0.3259|0.2925|0.4283|0.3447|0.1562|0.1856|0.5634|0.3315|0.3071|
72
 
73
+ ##### First Turn
74
 
75
  |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
76
  |---|---|---|---|---|---|---|---|---|---|
 
81
  | Swallow-70b-instruct-v0.1 |0.4849|0.5720|0.5020|0.4780|0.3680|0.2467|0.5400|0.5720|0.5960|
82
  | Swallow-70b-instruct-hf |0.3631|0.3420|0.4007|0.4220|0.1580|0.2044|0.6120|0.4280|0.3360|
83
 
84
+ ##### Second Turn
85
 
86
  |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
87
  |---|---|---|---|---|---|---|---|---|---|
 
92
  | Swallow-70b-instruct-v0.1 |0.4179|0.3913|0.5689|0.2184|0.3280|0.2884|0.5711|0.4171|0.5562|
93
  | Swallow-70b-instruct-hf |0.2872|0.2398|0.4564|0.2647|0.1540|0.1676|0.5118|0.2311|0.2762|
94
 
95
+ #### Comparison to the existing models
96
+
97
+ We only provide the overall score in this section.
98
+
99
+ ##### 7B models
100
+
101
+ |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
102
+ |---|---|---|---|---|---|---|---|---|---|
103
+ | Swallow-7b-instruct-v0.1 |0.3435|0.4450|0.4720|0.1853|0.1920|0.2204|0.3015|0.4594|0.4720|
104
+ | ELYZA-japanese-Llama-2-7b-fast-instruct |0.2827|0.3289|0.3907|0.2424|0.1480|0.1584|0.3511|0.3053|0.3365|
105
+ | calm2-7b-chat |0.3204|0.4657|0.4898|0.1837|0.1005|0.1414|0.3927|0.3601|0.4293|
106
+ | calm2-7b-chat-dpo-experimental |0.3493|0.5312|0.5237|0.1857|0.1000|0.1813|0.3355|0.4320|0.5051|
107
+ | RakutenAI-7B-instruct |0.2994|0.3623|0.3711|0.3333|0.1763|0.1581|0.4215|0.2824|0.2901|
108
+ | RakutenAI-7B-chat |0.3667|0.4229|0.4644|0.3990|0.2161|0.2390|0.3416|0.3904|0.4601|
109
+
110
+ ##### 13B models
111
+
112
+ |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
113
+ |---|---|---|---|---|---|---|---|---|---|
114
+ | Swallow-13b-instruct-v0.1 |0.3669|0.4816|0.5562|0.2769|0.1020|0.1505|0.4179|0.4347|0.5150|
115
+ | ELYZA-japanese-Llama-2-13b-instruct |0.3196|0.4400|0.4373|0.2098|0.2157|0.1572|0.3583|0.3243|0.4141|
116
+ | ELYZA-japanese-Llama-2-13b-fast-instruct |0.3042|0.3729|0.3930|0.1236|0.2492|0.1862|0.4360|0.3233|0.3496|
117
+
118
+ ##### 70B models
119
+
120
+ |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
121
+ |---|---|---|---|---|---|---|---|---|---|
122
+ | Swallow-70b-instruct-v0.1 |0.4513|0.4822|0.5353|0.3497|0.3492|0.2668|0.5553|0.4955|0.5767|
123
+ | japanese-stablelm-instruct-beta-70b |0.3716|0.4179|0.3945|0.3656|0.2580|0.2186|0.4412|0.4663|0.4103|
124
+
125
 
126
  ## Evaluation Benchmarks
127
 
 
153
  The template used to construct a prompt for the Instruct model is specified as follows:
154
 
155
  ```
156
+ <s>[INST] <<SYS>>\n{Instruction}\n<</SYS>>\n\n{USER_MESSAGE_1} [INST] {BOT_MESSAGE_1} </s>[INST] {USER_MESSAGE_2}[/INST]
157
  ```
158
 
159
  Please be aware that ``<s>`` and ``</s>`` are special tokens used for the beginning of string (BOS) and end of string (EOS), respectively, while [INST] and [/INST] are considered regular strings.