ctlin's picture
upload model files
664fc70
|
raw
history blame
1.65 kB
---
language:
- zh
thumbnail: https://ckip.iis.sinica.edu.tw/files/ckip_logo.png
tags:
- pytorch
- token-classification
- bert
- zh
license: gpl-3.0
---
# CKIP Oldhan BERT Base Chinese WS
This model provides word segmentation for the oldhan Chinese language. Our training dataset covers four eras of the Chinese language.
## Homepage
* [ckiplab/han-transformers](https://github.com/ckiplab/han-transformers)
## Training Datasets
The copyright of the datasets belongs to the Institute of Linguistics, Academia Sinica.
* [中央研究院上古漢語標記語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/akiwi/kiwi.sh?ukey=-406192123&qtype=-1)
* [中央研究院中古漢語語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/dkiwi/kiwi.sh?ukey=852967425&qtype=-1)
* [中央研究院近代漢語語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/pkiwi/kiwi.sh?ukey=-299696128&qtype=-1)
* [中央研究院現代漢語語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/mkiwi/kiwi.sh)
## Contributors
* Chin-Tung Lin at [CKIP](https://ckip.iis.sinica.edu.tw/)
## Usage
* Using our model in your script
```python
from transformers import (
AutoTokenizer,
AutoModel,
)
tokenizer = AutoTokenizer.from_pretrained("ckiplab/oldhan-bert-base-chinese-ws")
model = AutoModel.from_pretrained("ckiplab/oldhan-bert-base-chinese-ws")
```
* Using our model for inference
```python
>>> from transformers import pipeline
>>> classifier = pipeline("token-classification", model="ckiplab/oldhan-bert-base-chinese-ws")
>>> classifier("帝堯曰放勳")
```