stjohn2007
commited on
Commit
•
c9d35bb
1
Parent(s):
10f5f63
Update README.md
Browse files
README.md
CHANGED
@@ -52,11 +52,13 @@ This repository provides large language models developed by [TokyoTech-LLM](http
|
|
52 |
|
53 |
### MT-Bench JA
|
54 |
|
|
|
|
|
|
|
55 |
* NOTE that the models with the `v0.1` suffix are newer versions compared to their original counterparts with the `hf`.
|
56 |
* We report overall (i.e., average over scores of the first and second turns), first, and second turn scores.
|
57 |
|
58 |
-
|
59 |
-
#### Overall
|
60 |
|
61 |
|
62 |
|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
|
@@ -68,7 +70,7 @@ This repository provides large language models developed by [TokyoTech-LLM](http
|
|
68 |
| Swallow-70b-instruct-v0.1 |0.4513|0.4822|0.5353|0.3497|0.3492|0.2668|0.5553|0.4955|0.5767|
|
69 |
| Swallow-70b-instruct-hf |0.3259|0.2925|0.4283|0.3447|0.1562|0.1856|0.5634|0.3315|0.3071|
|
70 |
|
71 |
-
|
72 |
|
73 |
|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
|
74 |
|---|---|---|---|---|---|---|---|---|---|
|
@@ -79,7 +81,7 @@ This repository provides large language models developed by [TokyoTech-LLM](http
|
|
79 |
| Swallow-70b-instruct-v0.1 |0.4849|0.5720|0.5020|0.4780|0.3680|0.2467|0.5400|0.5720|0.5960|
|
80 |
| Swallow-70b-instruct-hf |0.3631|0.3420|0.4007|0.4220|0.1580|0.2044|0.6120|0.4280|0.3360|
|
81 |
|
82 |
-
|
83 |
|
84 |
|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
|
85 |
|---|---|---|---|---|---|---|---|---|---|
|
@@ -90,6 +92,36 @@ This repository provides large language models developed by [TokyoTech-LLM](http
|
|
90 |
| Swallow-70b-instruct-v0.1 |0.4179|0.3913|0.5689|0.2184|0.3280|0.2884|0.5711|0.4171|0.5562|
|
91 |
| Swallow-70b-instruct-hf |0.2872|0.2398|0.4564|0.2647|0.1540|0.1676|0.5118|0.2311|0.2762|
|
92 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
93 |
|
94 |
## Evaluation Benchmarks
|
95 |
|
@@ -121,7 +153,7 @@ This format must be adhered to strictly, as deviations may result in less optima
|
|
121 |
The template used to construct a prompt for the Instruct model is specified as follows:
|
122 |
|
123 |
```
|
124 |
-
<s>[INST] <<SYS>>\n{Instruction}\n<</SYS>>\n\n{USER_MESSAGE_1} [
|
125 |
```
|
126 |
|
127 |
Please be aware that ``<s>`` and ``</s>`` are special tokens used for the beginning of string (BOS) and end of string (EOS), respectively, while [INST] and [/INST] are considered regular strings.
|
|
|
52 |
|
53 |
### MT-Bench JA
|
54 |
|
55 |
+
|
56 |
+
#### Comparison to the past version
|
57 |
+
|
58 |
* NOTE that the models with the `v0.1` suffix are newer versions compared to their original counterparts with the `hf`.
|
59 |
* We report overall (i.e., average over scores of the first and second turns), first, and second turn scores.
|
60 |
|
61 |
+
##### Overall
|
|
|
62 |
|
63 |
|
64 |
|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
|
|
|
70 |
| Swallow-70b-instruct-v0.1 |0.4513|0.4822|0.5353|0.3497|0.3492|0.2668|0.5553|0.4955|0.5767|
|
71 |
| Swallow-70b-instruct-hf |0.3259|0.2925|0.4283|0.3447|0.1562|0.1856|0.5634|0.3315|0.3071|
|
72 |
|
73 |
+
##### First Turn
|
74 |
|
75 |
|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
|
76 |
|---|---|---|---|---|---|---|---|---|---|
|
|
|
81 |
| Swallow-70b-instruct-v0.1 |0.4849|0.5720|0.5020|0.4780|0.3680|0.2467|0.5400|0.5720|0.5960|
|
82 |
| Swallow-70b-instruct-hf |0.3631|0.3420|0.4007|0.4220|0.1580|0.2044|0.6120|0.4280|0.3360|
|
83 |
|
84 |
+
##### Second Turn
|
85 |
|
86 |
|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
|
87 |
|---|---|---|---|---|---|---|---|---|---|
|
|
|
92 |
| Swallow-70b-instruct-v0.1 |0.4179|0.3913|0.5689|0.2184|0.3280|0.2884|0.5711|0.4171|0.5562|
|
93 |
| Swallow-70b-instruct-hf |0.2872|0.2398|0.4564|0.2647|0.1540|0.1676|0.5118|0.2311|0.2762|
|
94 |
|
95 |
+
#### Comparison to the existing models
|
96 |
+
|
97 |
+
We only provide the overall score in this section.
|
98 |
+
|
99 |
+
##### 7B models
|
100 |
+
|
101 |
+
|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
|
102 |
+
|---|---|---|---|---|---|---|---|---|---|
|
103 |
+
| Swallow-7b-instruct-v0.1 |0.3435|0.4450|0.4720|0.1853|0.1920|0.2204|0.3015|0.4594|0.4720|
|
104 |
+
| ELYZA-japanese-Llama-2-7b-fast-instruct |0.2827|0.3289|0.3907|0.2424|0.1480|0.1584|0.3511|0.3053|0.3365|
|
105 |
+
| calm2-7b-chat |0.3204|0.4657|0.4898|0.1837|0.1005|0.1414|0.3927|0.3601|0.4293|
|
106 |
+
| calm2-7b-chat-dpo-experimental |0.3493|0.5312|0.5237|0.1857|0.1000|0.1813|0.3355|0.4320|0.5051|
|
107 |
+
| RakutenAI-7B-instruct |0.2994|0.3623|0.3711|0.3333|0.1763|0.1581|0.4215|0.2824|0.2901|
|
108 |
+
| RakutenAI-7B-chat |0.3667|0.4229|0.4644|0.3990|0.2161|0.2390|0.3416|0.3904|0.4601|
|
109 |
+
|
110 |
+
##### 13B models
|
111 |
+
|
112 |
+
|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
|
113 |
+
|---|---|---|---|---|---|---|---|---|---|
|
114 |
+
| Swallow-13b-instruct-v0.1 |0.3669|0.4816|0.5562|0.2769|0.1020|0.1505|0.4179|0.4347|0.5150|
|
115 |
+
| ELYZA-japanese-Llama-2-13b-instruct |0.3196|0.4400|0.4373|0.2098|0.2157|0.1572|0.3583|0.3243|0.4141|
|
116 |
+
| ELYZA-japanese-Llama-2-13b-fast-instruct |0.3042|0.3729|0.3930|0.1236|0.2492|0.1862|0.4360|0.3233|0.3496|
|
117 |
+
|
118 |
+
##### 70B models
|
119 |
+
|
120 |
+
|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
|
121 |
+
|---|---|---|---|---|---|---|---|---|---|
|
122 |
+
| Swallow-70b-instruct-v0.1 |0.4513|0.4822|0.5353|0.3497|0.3492|0.2668|0.5553|0.4955|0.5767|
|
123 |
+
| japanese-stablelm-instruct-beta-70b |0.3716|0.4179|0.3945|0.3656|0.2580|0.2186|0.4412|0.4663|0.4103|
|
124 |
+
|
125 |
|
126 |
## Evaluation Benchmarks
|
127 |
|
|
|
153 |
The template used to construct a prompt for the Instruct model is specified as follows:
|
154 |
|
155 |
```
|
156 |
+
<s>[INST] <<SYS>>\n{Instruction}\n<</SYS>>\n\n{USER_MESSAGE_1} [INST] {BOT_MESSAGE_1} </s>[INST] {USER_MESSAGE_2}[/INST]
|
157 |
```
|
158 |
|
159 |
Please be aware that ``<s>`` and ``</s>`` are special tokens used for the beginning of string (BOS) and end of string (EOS), respectively, while [INST] and [/INST] are considered regular strings.
|