tokyotech-llm
/

Swallow-13b-instruct-v0.1

@@ -52,11 +52,13 @@ This repository provides large language models developed by [TokyoTech-LLM](http
 ### MT-Bench JA
 * NOTE that the models with the `v0.1` suffix are newer versions compared to their original counterparts with the `hf`.
 * We report overall (i.e., average over scores of the first and second turns), first, and second turn scores.
-#### Overall
 |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
@@ -68,7 +70,7 @@ This repository provides large language models developed by [TokyoTech-LLM](http
 | Swallow-70b-instruct-v0.1 |0.4513|0.4822|0.5353|0.3497|0.3492|0.2668|0.5553|0.4955|0.5767|
 | Swallow-70b-instruct-hf |0.3259|0.2925|0.4283|0.3447|0.1562|0.1856|0.5634|0.3315|0.3071|
-#### First Turn
 |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
 |---|---|---|---|---|---|---|---|---|---|
@@ -79,7 +81,7 @@ This repository provides large language models developed by [TokyoTech-LLM](http
 | Swallow-70b-instruct-v0.1 |0.4849|0.5720|0.5020|0.4780|0.3680|0.2467|0.5400|0.5720|0.5960|
 | Swallow-70b-instruct-hf |0.3631|0.3420|0.4007|0.4220|0.1580|0.2044|0.6120|0.4280|0.3360|
-#### Second Turn
 |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
 |---|---|---|---|---|---|---|---|---|---|
@@ -90,6 +92,36 @@ This repository provides large language models developed by [TokyoTech-LLM](http
 | Swallow-70b-instruct-v0.1 |0.4179|0.3913|0.5689|0.2184|0.3280|0.2884|0.5711|0.4171|0.5562|
 | Swallow-70b-instruct-hf |0.2872|0.2398|0.4564|0.2647|0.1540|0.1676|0.5118|0.2311|0.2762|
 ## Evaluation Benchmarks
@@ -121,7 +153,7 @@ This format must be adhered to strictly, as deviations may result in less optima
 The template used to construct a prompt for the Instruct model is specified as follows:
 ```
-<s>[INST] <<SYS>>\n{Instruction}\n<</SYS>>\n\n{USER_MESSAGE_1} [/INST] {BOT_MESSAGE_1} </s>[INST] {USER_MESSAGE_2}[/INST]
 ```
 Please be aware that ``<s>`` and ``</s>`` are special tokens used for the beginning of string (BOS) and end of string (EOS), respectively, while [INST] and [/INST] are considered regular strings.

 ### MT-Bench JA
+#### Comparison to the past version
 * NOTE that the models with the `v0.1` suffix are newer versions compared to their original counterparts with the `hf`.
 * We report overall (i.e., average over scores of the first and second turns), first, and second turn scores.
+##### Overall
 |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
 | Swallow-70b-instruct-v0.1 |0.4513|0.4822|0.5353|0.3497|0.3492|0.2668|0.5553|0.4955|0.5767|
 | Swallow-70b-instruct-hf |0.3259|0.2925|0.4283|0.3447|0.1562|0.1856|0.5634|0.3315|0.3071|
+##### First Turn
 |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
 |---|---|---|---|---|---|---|---|---|---|
 | Swallow-70b-instruct-v0.1 |0.4849|0.5720|0.5020|0.4780|0.3680|0.2467|0.5400|0.5720|0.5960|
 | Swallow-70b-instruct-hf |0.3631|0.3420|0.4007|0.4220|0.1580|0.2044|0.6120|0.4280|0.3360|
+##### Second Turn
 |Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
 |---|---|---|---|---|---|---|---|---|---|
 | Swallow-70b-instruct-v0.1 |0.4179|0.3913|0.5689|0.2184|0.3280|0.2884|0.5711|0.4171|0.5562|
 | Swallow-70b-instruct-hf |0.2872|0.2398|0.4564|0.2647|0.1540|0.1676|0.5118|0.2311|0.2762|
+#### Comparison to the existing models
+We only provide the overall score in this section.
+##### 7B models
+|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
+|---|---|---|---|---|---|---|---|---|---|
+| Swallow-7b-instruct-v0.1 |0.3435|0.4450|0.4720|0.1853|0.1920|0.2204|0.3015|0.4594|0.4720|
+| ELYZA-japanese-Llama-2-7b-fast-instruct |0.2827|0.3289|0.3907|0.2424|0.1480|0.1584|0.3511|0.3053|0.3365|
+| calm2-7b-chat |0.3204|0.4657|0.4898|0.1837|0.1005|0.1414|0.3927|0.3601|0.4293|
+| calm2-7b-chat-dpo-experimental |0.3493|0.5312|0.5237|0.1857|0.1000|0.1813|0.3355|0.4320|0.5051|
+| RakutenAI-7B-instruct |0.2994|0.3623|0.3711|0.3333|0.1763|0.1581|0.4215|0.2824|0.2901|
+| RakutenAI-7B-chat |0.3667|0.4229|0.4644|0.3990|0.2161|0.2390|0.3416|0.3904|0.4601|
+##### 13B models
+|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
+|---|---|---|---|---|---|---|---|---|---|
+| Swallow-13b-instruct-v0.1 |0.3669|0.4816|0.5562|0.2769|0.1020|0.1505|0.4179|0.4347|0.5150|
+| ELYZA-japanese-Llama-2-13b-instruct |0.3196|0.4400|0.4373|0.2098|0.2157|0.1572|0.3583|0.3243|0.4141|
+| ELYZA-japanese-Llama-2-13b-fast-instruct |0.3042|0.3729|0.3930|0.1236|0.2492|0.1862|0.4360|0.3233|0.3496|
+##### 70B models
+|Model|Average|Writing|Roleplay|Reasoning|Math|Coding|Extraction|STEM|Humanities|
+|---|---|---|---|---|---|---|---|---|---|
+| Swallow-70b-instruct-v0.1 |0.4513|0.4822|0.5353|0.3497|0.3492|0.2668|0.5553|0.4955|0.5767|
+| japanese-stablelm-instruct-beta-70b |0.3716|0.4179|0.3945|0.3656|0.2580|0.2186|0.4412|0.4663|0.4103|
 ## Evaluation Benchmarks
 The template used to construct a prompt for the Instruct model is specified as follows:
 ```
+<s>[INST] <<SYS>>\n{Instruction}\n<</SYS>>\n\n{USER_MESSAGE_1} [INST] {BOT_MESSAGE_1} </s>[INST] {USER_MESSAGE_2}[/INST]
 ```
 Please be aware that ``<s>`` and ``</s>`` are special tokens used for the beginning of string (BOS) and end of string (EOS), respectively, while [INST] and [/INST] are considered regular strings.