fukugawa
/

transformer-lm-japanese-0.1b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

fukugawa commited on May 20

Commit

1b8053a

•

1 Parent(s): cc04067

Update README.md

Files changed (1) hide show

README.md +15 -3

README.md CHANGED Viewed

@@ -18,6 +18,7 @@ datasets:
 This is a JAX/Flax-based transformer language model trained on a Japanese dataset. It is based on the official Flax example code ([lm1b](https://github.com/google/flax/tree/main/examples/lm1b)).
 ## Update Log
 * 2024/05/13 FlaxAutoModelForCausalLM is now supported with custom model code added.
 ## Source Code
@@ -34,11 +35,22 @@ We've modified Flax's 'lm1b' example to train on Japanese dataset. You can find
 | Model | Params | Layers | Dim | Heads | PPL | Dataset | Training time |
 |-|-|-|-|-|-|-|-|
-| lm1b-default | 0.05B | 6 | 512 | 8 | 22.67 | lm1b | 0.5 days |
-| transformer-lm-japanese-default | 0.05B | 6 | 512 | 8 | 66.38 | cc100/ja | 0.5 days |
 | transformer-lm-japanese-0.1b | 0.1B | 12 | 768 | 12 | 35.22 | wiki40b/ja | 1.5 days |
-![tensor-board](./tensorboard-v1.png)
 ## Usage: FlaxAutoModel

 This is a JAX/Flax-based transformer language model trained on a Japanese dataset. It is based on the official Flax example code ([lm1b](https://github.com/google/flax/tree/main/examples/lm1b)).
 ## Update Log
+* 2024/05/20 Added JGLUE 4-task benchmark scores.
 * 2024/05/13 FlaxAutoModelForCausalLM is now supported with custom model code added.
 ## Source Code
 | Model | Params | Layers | Dim | Heads | PPL | Dataset | Training time |
 |-|-|-|-|-|-|-|-|
 | transformer-lm-japanese-0.1b | 0.1B | 12 | 768 | 12 | 35.22 | wiki40b/ja | 1.5 days |
+## Benchmarking
+* **JGLUE 4-task (2024/05/20)**
+    - *We used [Stability-AI/lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness) library for evaluation.*
+    - *We modified the harness to work with the FlaxAutoModel for evaluating JAX/Flax models. See the code [here](https://github.com/FookieMonster/lm-evaluation-harness).*
+    - *We evaluated four tasks: JCommonsenseQA-1.1, JNLI-1.3, MARC-ja-1.1, and JSQuAD-1.1.*
+    - *All evaluations used version 0.3 of the prompt template and were zero-shot.*
+    - *The number of few-shots is 0,0,0,0.*
+    | Model | Average | JCommonsenseQA | JNLI | MARC-ja | JSQuAD |
+    | :-- | :-- | :-- | :-- | :-- | :-- |
+    | transformer-lm-japanese-0.1b | 41.19 | 25.47 | 45.60 | 85.46 | 8.24 |
+    | Reference: rinna/japanese-gpt-neox-small | 40.75 | 40.39 | 29.13 | 85.48 | 8.02 |
 ## Usage: FlaxAutoModel