Update README.md
Browse files
README.md
CHANGED
@@ -1,14 +1,41 @@
|
|
|
|
|
|
1 |
---
|
2 |
language: zh
|
3 |
tags:
|
4 |
- roformer
|
5 |
- pytorch
|
6 |
- tf2.0
|
|
|
7 |
widget:
|
8 |
- text: "今天[MASK]很好,我想去公园玩!"
|
9 |
---
|
10 |
## 介绍
|
11 |
在13g的cluecorpussmall数据集上进行的预训练,使用了`Whole Mask LM` 和 `SOP` 任务
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
### tf版本
|
13 |
https://github.com/ZhuiyiTechnology/roformer
|
14 |
|
@@ -51,5 +78,4 @@ Bibtex:
|
|
51 |
eprint={2104.09864},
|
52 |
archivePrefix={arXiv},
|
53 |
primaryClass={cs.CL}
|
54 |
-
}
|
55 |
-
```
|
|
|
1 |
+
128*30+256*15+256*14.5+256*46.5+256*17=27648w
|
2 |
+
|
3 |
---
|
4 |
language: zh
|
5 |
tags:
|
6 |
- roformer
|
7 |
- pytorch
|
8 |
- tf2.0
|
9 |
+
- paddlepaddle
|
10 |
widget:
|
11 |
- text: "今天[MASK]很好,我想去公园玩!"
|
12 |
---
|
13 |
## 介绍
|
14 |
在13g的cluecorpussmall数据集上进行的预训练,使用了`Whole Mask LM` 和 `SOP` 任务
|
15 |
+
|
16 |
+
训练逻辑参考了这里。https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/language_model/ernie-1.0
|
17 |
+
|
18 |
+
## 训练细节:
|
19 |
+
- paddlepaddle+paddlenlp
|
20 |
+
- V100 x 4
|
21 |
+
- batch size 256
|
22 |
+
- max_seq_len 512
|
23 |
+
- max_lr 0.0001
|
24 |
+
- min_lr 0.00001
|
25 |
+
- weight_decay 0.01
|
26 |
+
- grad_clip 1.0
|
27 |
+
- 总共训练的batch size 128*30w+256*15w+256*14.5w+256*46.5w+256*17w=27648w
|
28 |
+
- 约等于512 batch size, 100w步条件下的54%
|
29 |
+
|
30 |
+
最终loss:
|
31 |
+
```python
|
32 |
+
[2022-02-05 16:05:59,067] [ INFO] - global step 170100, loss: 2.651634932, lm_loss: 2.603405, sop_loss: 0.048229, speed: 1.06 steps/s, ips: 271.68 seqs/s, learning rate: 6.66465e-05, loss_scaling: 137438.96875, num_good_steps: 356, num_bad_steps: 0
|
33 |
+
[2022-02-05 16:07:28,227] [ INFO] - global step 170200, loss: 2.822231531, lm_loss: 2.662831, sop_loss: 0.159401, speed: 1.12 steps/s, ips: 287.13 seqs/s, learning rate: 6.66263e-05, loss_scaling: 137438.96875, num_good_steps: 59, num_bad_steps: 0
|
34 |
+
[2022-02-05 16:08:57,346] [ INFO] - global step 170300, loss: 2.710968971, lm_loss: 2.673646, sop_loss: 0.037323, speed: 1.12 steps/s, ips: 287.26 seqs/s, learning rate: 6.66061e-05, loss_scaling: 137438.96875, num_good_steps: 159, num_bad_steps: 0
|
35 |
+
[2022-02-05 16:10:26,698] [ INFO] - global step 170400, loss: 2.867662907, lm_loss: 2.619032, sop_loss: 0.248631, speed: 1.12 steps/s, ips: 286.51 seqs/s, learning rate: 6.65859e-05, loss_scaling: 137438.96875, num_good_steps: 259, num_bad_steps: 0
|
36 |
+
[2022-02-05 16:11:55,714] [ INFO] - global step 170500, loss: 3.158756495, lm_loss: 2.953678, sop_loss: 0.205079, speed: 1.12 steps/s, ips: 287.59 seqs/s, learning rate: 6.65657e-05, loss_scaling: 137438.96875, num_good_steps: 359, num_bad_steps: 0
|
37 |
+
[2022-02-05 16:13:24,869] [ INFO] - global step 170600, loss: 2.860815048, lm_loss: 2.754750, sop_loss: 0.106064, speed: 1.12 steps/s, ips: 287.14 seqs/s, learning rate: 6.65455e-05, loss_scaling: 137438.96875, num_good_steps: 33, num_bad_steps: 0
|
38 |
+
```
|
39 |
### tf版本
|
40 |
https://github.com/ZhuiyiTechnology/roformer
|
41 |
|
|
|
78 |
eprint={2104.09864},
|
79 |
archivePrefix={arXiv},
|
80 |
primaryClass={cs.CL}
|
81 |
+
}
|
|