paddlenlp
PaddlePaddle
Chinese
ernie
ernie-3.0-nano-zh / README.md
sijunhe's picture
Update README.md
3b9f5d9
|
raw
history blame
51.6 kB
metadata
license: apache-2.0
library_name: paddlenlp

paddlenlp-banner

PaddlePaddle/ernie-3.0-nano-zh

Intro

ERNIE Tiny Models are lightweight models obtained from Wenxin large model ERNIE 3.0 using distillation technology. The model structure is consistent with ERNIE 2.0, and has a stronger Chinese effect than ERNIE 2.0.

For a detailed explanation of related technologies, please refer to the article 解析全球最大中文单体模型鹏城-百度·文心技术细节

How to Use

Click on the "Use in paddlenlp" on the top right corner!

Performance

ERNIE Tiny 3.0 open sources five models: ERNIE 3.0 BaseERNIE 3.0 MediumERNIE 3.0 MiniERNIE 3.0 MicroERNIE 3.0 Nano

  • ERNIE 3.0-Base (12-layer, 768-hidden, 12-heads)
  • ERNIE 3.0-Medium (6-layer, 768-hidden, 12-heads)
  • ERNIE 3.0-Mini (6-layer, 384-hidden, 12-heads)
  • ERNIE 3.0-Micro (4-layer, 384-hidden, 12-heads)
  • ERNIE 3.0-Nano (4-layer, 312-hidden, 12-heads)

Below is the precision-latency graph of the small Chinese models in PaddleNLP. The abscissa represents the latency (unit: ms) tested on CLUE IFLYTEK dataset (maximum sequence length is set to 128), and the ordinate is the average accuracy on 10 CLUE tasks (including text classification, text matching, natural language inference, Pronoun disambiguation, machine reading comprehension and other tasks), among which the metric of CMRC2018 is Exact Match (EM), and the metric of other tasks is Accuracy. The closer the model to the top left in the figure, the higher the level of accuracy and performance.The top left model in the figure has the highest level of accuracy and performance.

The number of parameters of the model are marked under the model name in the figure. For the test environment, see Performance Test in details.

precision-latency graph under CPU (number of threads: 1 and 8), batch_size = 32:

precision-latency graph under CPU (number of threads: 1 and 8), batch_size = 1:

precision-latency graph under GPU, batch_size = 32, 1:

As can be seen from the figure, the comprehensive performance of the ERNIE Tiny 3.0 models has been comprehensively ahead of UER-py, Huawei-Noah and HFL in terms of accuracy and performance. And when batch_size=1 and the precision mode is FP16, the inference performance of the wide and shallow model on the GPU is more advantageous.

The precision data on the CLUE validation set are shown in the following table:

Arch Model AVG AFQMC TNEWS IFLYTEK CMNLI OCNLI CLUEWSC2020 CSL CMRC2018 CHID C3
24L1024H ERNIE 1.0-Large-cw 79.03 75.97 59.65 62.91 85.09 81.73 93.09 84.53 74.22/91.88 88.57 84.54
ERNIE 2.0-Large-zh 76.90 76.23 59.33 61.91 83.85 79.93 89.82 83.23 70.95/90.31 86.78 78.12
RoBERTa-wwm-ext-large 76.61 76.00 59.33 62.02 83.88 78.81 90.79 83.67 70.58/89.82 85.72 75.26
20L1024H ERNIE 3.0-Xbase-zh 78.39 76.16 59.55 61.87 84.40 81.73 88.82 83.60 75.99/93.00 86.78 84.98
12L768H ERNIE 3.0-Base-zh 76.05 75.93 58.26 61.56 83.02 80.10 86.18 82.63 70.71/90.41 84.26 77.88
ERNIE 1.0-Base-zh-cw 76.47 76.07 57.86 59.91 83.41 79.58 89.91 83.42 72.88/90.78 84.68 76.98
ERNIE-Gram-zh 75.72 75.28 57.88 60.87 82.90 79.08 88.82 82.83 71.82/90.38 84.04 73.69
Langboat/Mengzi-BERT-Base 74.69 75.35 57.76 61.64 82.41 77.93 88.16 82.20 67.04/88.35 83.74 70.70
ERNIE 2.0-Base-zh 74.32 75.65 58.25 61.64 82.62 78.71 81.91 82.33 66.08/87.46 82.78 73.19
ERNIE 1.0-Base-zh 74.17 74.84 58.91 62.25 81.68 76.58 85.20 82.77 67.32/87.83 82.47 69.68
RoBERTa-wwm-ext 74.11 74.60 58.08 61.23 81.11 76.92 88.49 80.77 68.39/88.50 83.43 68.03
BERT-Base-Chinese 72.57 74.63 57.13 61.29 80.97 75.22 81.91 81.90 65.30/86.53 82.01 65.38
UER/Chinese-RoBERTa-Base 71.78 72.89 57.62 61.14 80.01 75.56 81.58 80.80 63.87/84.95 81.52 62.76
8L512H UER/Chinese-RoBERTa-Medium 67.06 70.64 56.10 58.29 77.35 71.90 68.09 78.63 57.63/78.91 75.13 56.84
6L768H ERNIE 3.0-Medium-zh 72.49 73.37 57.00 60.67 80.64 76.88 79.28 81.60 65.83/87.30 79.91 69.73
HLF/RBT6, Chinese 70.06 73.45 56.82 59.64 79.36 73.32 76.64 80.67 62.72/84.77 78.17 59.85
TinyBERT6, Chinese 69.62 72.22 55.70 54.48 79.12 74.07 77.63 80.17 63.03/83.75 77.64 62.11
RoFormerV2 Small 68.52 72.47 56.53 60.72 76.37 72.95 75.00 81.07 62.97/83.64 67.66 59.41
UER/Chinese-RoBERTa-L6-H768 67.09 70.13 56.54 60.48 77.49 72.00 72.04 77.33 53.74/75.52 76.73 54.40
6L384H ERNIE 3.0-Mini-zh 66.90 71.85 55.24 54.48 77.19 73.08 71.05 79.30 58.53/81.97 69.71 58.60
4L768H HFL/RBT4, Chinese 67.42 72.41 56.50 58.95 77.34 70.78 71.05 78.23 59.30/81.93 73.18 56.45
4L512H UER/Chinese-RoBERTa-Small 63.25 69.21 55.41 57.552 73.64 69.80 66.78 74.83 46.75/69.69 67.59 50.92
4L384H ERNIE 3.0-Micro-zh 64.21 71.15 55.05 53.83 74.81 70.41 69.08 76.50 53.77/77.82 62.26 55.53
4L312H ERNIE 3.0-Nano-zh 62.97 70.51 54.57 48.36 74.97 70.61 68.75 75.93 52.00/76.35 58.91 55.11
TinyBERT4, Chinese 60.82 69.07 54.02 39.71 73.94 69.59 70.07 75.07 46.04/69.34 58.53 52.18
4L256H UER/Chinese-RoBERTa-Mini 53.40 69.32 54.22 41.63 69.40 67.36 65.13 70.07 5.96/17.13 51.19 39.68
3L1024H HFL/RBTL3, Chinese 66.63 71.11 56.14 59.56 76.41 71.29 69.74 76.93 58.50/80.90 71.03 55.56
3L768H HFL/RBT3, Chinese 65.72 70.95 55.53 59.18 76.20 70.71 67.11 76.63 55.73/78.63 70.26 54.93
2L128H UER/Chinese-RoBERTa-Tiny 44.45 69.02 51.47 20.28 59.95 57.73 63.82 67.43 3.08/14.33 23.57 28.12

Citation Info

@article{sun2021ernie,
  title={Ernie 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation},
  author={Sun, Yu and Wang, Shuohuan and Feng, Shikun and Ding, Siyu and Pang, Chao and Shang, Junyuan and Liu, Jiaxiang and Chen, Xuyi and Zhao, Yanbin and Lu, Yuxiang and others},
  journal={arXiv preprint arXiv:2107.02137},
  year={2021}
}

@article{su2021ernie,
  title={Ernie-tiny: A progressive distillation framework for pretrained transformer compression},
  author={Su, Weiyue and Chen, Xuyi and Feng, Shikun and Liu, Jiaxiang and Liu, Weixin and Sun, Yu and Tian, Hao and Wu, Hua and Wang, Haifeng},
  journal={arXiv preprint arXiv:2106.02241},
  year={2021}
}

@article{wang2021ernie,
  title={Ernie 3.0 titan: Exploring larger-scale knowledge enhanced pre-training for language understanding and generation},
  author={Wang, Shuohuan and Sun, Yu and Xiang, Yang and Wu, Zhihua and Ding, Siyu and Gong, Weibao and Feng, Shikun and Shang, Junyuan and Zhao, Yanbin and Pang, Chao and others},
  journal={arXiv preprint arXiv:2112.12731},
  year={2021}
}