File size: 2,094 Bytes
3999629 664fc70 3999629 664fc70 7b436d9 664fc70 023658a 664fc70 023658a 664fc70 7b436d9 664fc70 7b436d9 664fc70 7b5f4fe 664fc70 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
---
language:
- zh
thumbnail: https://ckip.iis.sinica.edu.tw/files/ckip_logo.png
tags:
- pytorch
- token-classification
- bert
- zh
license: gpl-3.0
---
# CKIP BERT Base Han Chinese WS
This model provides word segmentation for the ancient Chinese language. Our training dataset covers four eras of the Chinese language.
## Homepage
* [ckiplab/han-transformers](https://github.com/ckiplab/han-transformers)
## Training Datasets
The copyright of the datasets belongs to the Institute of Linguistics, Academia Sinica.
* [中央研究院上古漢語標記語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/akiwi/kiwi.sh)
* [中央研究院中古漢語語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/dkiwi/kiwi.sh)
* [中央研究院近代漢語語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/pkiwi/kiwi.sh)
* [中央研究院現代漢語語料庫](http://asbc.iis.sinica.edu.tw)
## Contributors
* Chin-Tung Lin at [CKIP](https://ckip.iis.sinica.edu.tw/)
## Usage
* Using our model in your script
```python
from transformers import (
AutoTokenizer,
AutoModel,
)
tokenizer = AutoTokenizer.from_pretrained("ckiplab/bert-base-han-chinese-ws")
model = AutoModel.from_pretrained("ckiplab/bert-base-han-chinese-ws")
```
* Using our model for inference
```python
>>> from transformers import pipeline
>>> classifier = pipeline("token-classification", model="ckiplab/bert-base-han-chinese-ws")
>>> classifier("帝堯曰放勳")
# output
[{'entity': 'B',
'score': 0.9999793,
'index': 1,
'word': '帝',
'start': 0,
'end': 1},
{'entity': 'I',
'score': 0.9915047,
'index': 2,
'word': '堯',
'start': 1,
'end': 2},
{'entity': 'B',
'score': 0.99992275,
'index': 3,
'word': '曰',
'start': 2,
'end': 3},
{'entity': 'B',
'score': 0.99905187,
'index': 4,
'word': '放',
'start': 3,
'end': 4},
{'entity': 'I',
'score': 0.96299917,
'index': 5,
'word': '勳',
'start': 4,
'end': 5}]
``` |