qinluo/wobert-chinese-plus

Word based BERT model

原模型及说明见：https://github.com/ZhuiyiTechnology/WoBERT

pytorch 模型见: https://github.com/JunnYu/WoBERT_pytorch

安装 WoBertTokenizer

pip install git+https://github.com/JunnYu/WoBERT_pytorch.git

TF Example

from transformers import TFBertForMaskedLM as WoBertForMaskedLM
from wobert import WoBertTokenizer

import tensorflow as tf

pretrained_model_or_path = 'qinluo/wobert-chinese-plus'

tokenizer = WoBertTokenizer.from_pretrained(pretrained_model_or_path)
model = WoBertForMaskedLM.from_pretrained(pretrained_model_or_path)

text = '今天[MASK]很好，我[MASK]去公园玩。'
inputs = tokenizer(text, return_tensors='tf')
outputs = model(**inputs).logits[0]

outputs_sentence = ''
for i, id in enumerate(tokenizer.encode(text)):
    if id == tokenizer.mask_token_id:
        tokens = tokenizer.convert_ids_to_tokens(tf.math.top_k(outputs[i], k=5)[1])
        outputs_sentence += '[' + '|'.join(tokens) + ']'
    else:
        outputs_sentence += ''.join(tokenizer.convert_ids_to_tokens([id], skip_special_tokens=True))

print(outputs_sentence)

# 今天[天气|阳光|天|心情|空气]很好，我[想|要|打算|准备|就]去公园玩。

Pytorch Example

from transformers import BertForMaskedLM as WoBertForMaskedLM
from wobert import WoBertTokenizer


pretrained_model_or_path = 'qinluo/wobert-chinese-plus'

tokenizer = WoBertTokenizer.from_pretrained(pretrained_model_or_path)
model = WoBertForMaskedLM.from_pretrained(pretrained_model_or_path)

text = '今天[MASK]很好，我[MASK]去公园玩。'
inputs = tokenizer(text, return_tensors='pt')
outputs = model(**inputs).logits[0]

outputs_sentence = ''
for i, id in enumerate(tokenizer.encode(text)):
    if id == tokenizer.mask_token_id:
        tokens = tokenizer.convert_ids_to_tokens(outputs[i].topk(k=5)[1])
        outputs_sentence += '[' + '|'.join(tokens) + ']'
    else:
        outputs_sentence += ''.join(tokenizer.convert_ids_to_tokens([id], skip_special_tokens=True))

print(outputs_sentence)

# 今天[天气|阳光|天|心情|空气]很好，我[想|要|打算|准备|就]去公园玩。

引用

Bibtex：

@techreport{zhuiyiwobert,
  title={WoBERT: Word-based Chinese BERT model - ZhuiyiAI},
  author={Jianlin Su},
  year={2020},
  url="https://github.com/ZhuiyiTechnology/WoBERT",
}