junnyu commited on
Commit
79c19a2
·
1 Parent(s): 30fc258

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -2
README.md CHANGED
@@ -1,14 +1,41 @@
 
 
1
  ---
2
  language: zh
3
  tags:
4
  - roformer
5
  - pytorch
6
  - tf2.0
 
7
  widget:
8
  - text: "今天[MASK]很好,我想去公园玩!"
9
  ---
10
  ## 介绍
11
  在13g的cluecorpussmall数据集上进行的预训练,使用了`Whole Mask LM` 和 `SOP` 任务
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ### tf版本
13
  https://github.com/ZhuiyiTechnology/roformer
14
 
@@ -51,5 +78,4 @@ Bibtex:
51
  eprint={2104.09864},
52
  archivePrefix={arXiv},
53
  primaryClass={cs.CL}
54
- }
55
- ```
 
1
+ 128*30+256*15+256*14.5+256*46.5+256*17=27648w
2
+
3
  ---
4
  language: zh
5
  tags:
6
  - roformer
7
  - pytorch
8
  - tf2.0
9
+ - paddlepaddle
10
  widget:
11
  - text: "今天[MASK]很好,我想去公园玩!"
12
  ---
13
  ## 介绍
14
  在13g的cluecorpussmall数据集上进行的预训练,使用了`Whole Mask LM` 和 `SOP` 任务
15
+
16
+ 训练逻辑参考了这里。https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/language_model/ernie-1.0
17
+
18
+ ## 训练细节:
19
+ - paddlepaddle+paddlenlp
20
+ - V100 x 4
21
+ - batch size 256
22
+ - max_seq_len 512
23
+ - max_lr 0.0001
24
+ - min_lr 0.00001
25
+ - weight_decay 0.01
26
+ - grad_clip 1.0
27
+ - 总共训练的batch size 128*30w+256*15w+256*14.5w+256*46.5w+256*17w=27648w
28
+ - 约等于512 batch size, 100w步条件下的54%
29
+
30
+ 最终loss:
31
+ ```python
32
+ [2022-02-05 16:05:59,067] [ INFO] - global step 170100, loss: 2.651634932, lm_loss: 2.603405, sop_loss: 0.048229, speed: 1.06 steps/s, ips: 271.68 seqs/s, learning rate: 6.66465e-05, loss_scaling: 137438.96875, num_good_steps: 356, num_bad_steps: 0
33
+ [2022-02-05 16:07:28,227] [ INFO] - global step 170200, loss: 2.822231531, lm_loss: 2.662831, sop_loss: 0.159401, speed: 1.12 steps/s, ips: 287.13 seqs/s, learning rate: 6.66263e-05, loss_scaling: 137438.96875, num_good_steps: 59, num_bad_steps: 0
34
+ [2022-02-05 16:08:57,346] [ INFO] - global step 170300, loss: 2.710968971, lm_loss: 2.673646, sop_loss: 0.037323, speed: 1.12 steps/s, ips: 287.26 seqs/s, learning rate: 6.66061e-05, loss_scaling: 137438.96875, num_good_steps: 159, num_bad_steps: 0
35
+ [2022-02-05 16:10:26,698] [ INFO] - global step 170400, loss: 2.867662907, lm_loss: 2.619032, sop_loss: 0.248631, speed: 1.12 steps/s, ips: 286.51 seqs/s, learning rate: 6.65859e-05, loss_scaling: 137438.96875, num_good_steps: 259, num_bad_steps: 0
36
+ [2022-02-05 16:11:55,714] [ INFO] - global step 170500, loss: 3.158756495, lm_loss: 2.953678, sop_loss: 0.205079, speed: 1.12 steps/s, ips: 287.59 seqs/s, learning rate: 6.65657e-05, loss_scaling: 137438.96875, num_good_steps: 359, num_bad_steps: 0
37
+ [2022-02-05 16:13:24,869] [ INFO] - global step 170600, loss: 2.860815048, lm_loss: 2.754750, sop_loss: 0.106064, speed: 1.12 steps/s, ips: 287.14 seqs/s, learning rate: 6.65455e-05, loss_scaling: 137438.96875, num_good_steps: 33, num_bad_steps: 0
38
+ ```
39
  ### tf版本
40
  https://github.com/ZhuiyiTechnology/roformer
41
 
 
78
  eprint={2104.09864},
79
  archivePrefix={arXiv},
80
  primaryClass={cs.CL}
81
+ }