ckiplab
/

bert-base-han-chinese-ws

Token Classification

Inference Endpoints

Model card Files Files and versions Community

bert-base-han-chinese-ws / README.md

ctlin's picture

update

7b436d9 over 2 years ago

|

2.09 kB

	---
	language:
	- zh
	thumbnail: https://ckip.iis.sinica.edu.tw/files/ckip_logo.png
	tags:
	- pytorch
	- token-classification
	- bert
	- zh
	license: gpl-3.0
	---

	# CKIP BERT Base Han Chinese WS

	This model provides word segmentation for the ancient Chinese language. Our training dataset covers four eras of the Chinese language.

	## Homepage
	* [ckiplab/han-transformers](https://github.com/ckiplab/han-transformers)

	## Training Datasets
	The copyright of the datasets belongs to the Institute of Linguistics, Academia Sinica.
	* [中央研究院上古漢語標記語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/akiwi/kiwi.sh)
	* [中央研究院中古漢語語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/dkiwi/kiwi.sh)
	* [中央研究院近代漢語語料庫](http://lingcorpus.iis.sinica.edu.tw/cgi-bin/kiwi/pkiwi/kiwi.sh)
	* [中央研究院現代漢語語料庫](http://asbc.iis.sinica.edu.tw)

	## Contributors
	* Chin-Tung Lin at [CKIP](https://ckip.iis.sinica.edu.tw/)

	## Usage

	* Using our model in your script
	```python
	from transformers import (
	AutoTokenizer,
	AutoModel,
	)

	tokenizer = AutoTokenizer.from_pretrained("ckiplab/bert-base-han-chinese-ws")
	model = AutoModel.from_pretrained("ckiplab/bert-base-han-chinese-ws")
	```

	* Using our model for inference
	```python
	>>> from transformers import pipeline
	>>> classifier = pipeline("token-classification", model="ckiplab/bert-base-han-chinese-ws")
	>>> classifier("帝堯曰放勳")

	# output
	[{'entity': 'B',
	'score': 0.9999793,
	'index': 1,
	'word': '帝',
	'start': 0,
	'end': 1},
	{'entity': 'I',
	'score': 0.9915047,
	'index': 2,
	'word': '堯',
	'start': 1,
	'end': 2},
	{'entity': 'B',
	'score': 0.99992275,
	'index': 3,
	'word': '曰',
	'start': 2,
	'end': 3},
	{'entity': 'B',
	'score': 0.99905187,
	'index': 4,
	'word': '放',
	'start': 3,
	'end': 4},
	{'entity': 'I',
	'score': 0.96299917,
	'index': 5,
	'word': '勳',
	'start': 4,
	'end': 5}]
	```