fukugawa commited on
Commit
1b8053a
1 Parent(s): cc04067

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -3
README.md CHANGED
@@ -18,6 +18,7 @@ datasets:
18
  This is a JAX/Flax-based transformer language model trained on a Japanese dataset. It is based on the official Flax example code ([lm1b](https://github.com/google/flax/tree/main/examples/lm1b)).
19
 
20
  ## Update Log
 
21
  * 2024/05/13 FlaxAutoModelForCausalLM is now supported with custom model code added.
22
 
23
  ## Source Code
@@ -34,11 +35,22 @@ We've modified Flax's 'lm1b' example to train on Japanese dataset. You can find
34
 
35
  | Model | Params | Layers | Dim | Heads | PPL | Dataset | Training time |
36
  |-|-|-|-|-|-|-|-|
37
- | lm1b-default | 0.05B | 6 | 512 | 8 | 22.67 | lm1b | 0.5 days |
38
- | transformer-lm-japanese-default | 0.05B | 6 | 512 | 8 | 66.38 | cc100/ja | 0.5 days |
39
  | transformer-lm-japanese-0.1b | 0.1B | 12 | 768 | 12 | 35.22 | wiki40b/ja | 1.5 days |
40
 
41
- ![tensor-board](./tensorboard-v1.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## Usage: FlaxAutoModel
44
 
 
18
  This is a JAX/Flax-based transformer language model trained on a Japanese dataset. It is based on the official Flax example code ([lm1b](https://github.com/google/flax/tree/main/examples/lm1b)).
19
 
20
  ## Update Log
21
+ * 2024/05/20 Added JGLUE 4-task benchmark scores.
22
  * 2024/05/13 FlaxAutoModelForCausalLM is now supported with custom model code added.
23
 
24
  ## Source Code
 
35
 
36
  | Model | Params | Layers | Dim | Heads | PPL | Dataset | Training time |
37
  |-|-|-|-|-|-|-|-|
 
 
38
  | transformer-lm-japanese-0.1b | 0.1B | 12 | 768 | 12 | 35.22 | wiki40b/ja | 1.5 days |
39
 
40
+ ## Benchmarking
41
+
42
+ * **JGLUE 4-task (2024/05/20)**
43
+
44
+ - *We used [Stability-AI/lm-evaluation-harness](https://github.com/Stability-AI/lm-evaluation-harness) library for evaluation.*
45
+ - *We modified the harness to work with the FlaxAutoModel for evaluating JAX/Flax models. See the code [here](https://github.com/FookieMonster/lm-evaluation-harness).*
46
+ - *We evaluated four tasks: JCommonsenseQA-1.1, JNLI-1.3, MARC-ja-1.1, and JSQuAD-1.1.*
47
+ - *All evaluations used version 0.3 of the prompt template and were zero-shot.*
48
+ - *The number of few-shots is 0,0,0,0.*
49
+
50
+ | Model | Average | JCommonsenseQA | JNLI | MARC-ja | JSQuAD |
51
+ | :-- | :-- | :-- | :-- | :-- | :-- |
52
+ | transformer-lm-japanese-0.1b | 41.19 | 25.47 | 45.60 | 85.46 | 8.24 |
53
+ | Reference: rinna/japanese-gpt-neox-small | 40.75 | 40.39 | 29.13 | 85.48 | 8.02 |
54
 
55
  ## Usage: FlaxAutoModel
56