fukugawa
/

transformer-lm-japanese-0.1b

Text Generation

Inference Endpoints

Model card Files Files and versions Community

fukugawa commited on Jul 13, 2023

Commit

85e238b

•

1 Parent(s): 54101a9

Update README.md

Files changed (1) hide show

README.md +3 -0

README.md CHANGED Viewed

@@ -30,8 +30,11 @@ We've modified Flax's 'lm1b' example to train on Japanese dataset. You can find
 | Model | Params | Layers | Dim | Heads | PPL | Dataset | Training time |
 |-|-|-|-|-|-|-|-|
 | lm1b-default | 0.05B | 6 | 512 | 8 | 22.67 | lm1b | 0.5 days |
 | transformer-lm-japanese-0.1b | 0.1B | 12 | 768 | 12 | 35.22 | wiki40b/ja | 1.5 days |
 ## Usage
 Here, we explain the procedure to generate text from pretrained weights using a CPU. We used the following instance on GCE for the Python 3.8 environment.

 | Model | Params | Layers | Dim | Heads | PPL | Dataset | Training time |
 |-|-|-|-|-|-|-|-|
 | lm1b-default | 0.05B | 6 | 512 | 8 | 22.67 | lm1b | 0.5 days |
+| transformer-lm-japanese-default | 0.05B | 6 | 512 | 8 | 66.38 | cc100/ja | 0.5 days |
 | transformer-lm-japanese-0.1b | 0.1B | 12 | 768 | 12 | 35.22 | wiki40b/ja | 1.5 days |
+![tensor-board](./tensorboard-v1.png)
 ## Usage
 Here, we explain the procedure to generate text from pretrained weights using a CPU. We used the following instance on GCE for the Python 3.8 environment.