CKIP BERT Base Han Chinese

Pretrained model on Ancient Chinese language using a masked language modeling (MLM) objective.

Homepage

Training Datasets

The copyright of the datasets belongs to the Institute of Linguistics, Academia Sinica.

Contributors

  • Chin-Tung Lin at CKIP

Usage

  • Using our model in your script

    from transformers import (
      AutoTokenizer,
      AutoModel,
    )
    
    tokenizer = AutoTokenizer.from_pretrained("ckiplab/bert-base-han-chinese")
    model = AutoModel.from_pretrained("ckiplab/bert-base-han-chinese")
    
  • Using our model for inference

    >>> from transformers import pipeline
    >>> unmasker = pipeline('fill-mask', model='ckiplab/bert-base-han-chinese')
    >>> unmasker("黎[MASK]於變時雍。")
    
    [{'sequence': '黎 民 於 變 時 雍 。',
    'score': 0.14885780215263367,
    'token': 3696,
    'token_str': '民'},
    {'sequence': '黎 庶 於 變 時 雍 。',
    'score': 0.0859643816947937,
    'token': 2433,
    'token_str': '庶'},
    {'sequence': '黎 氏 於 變 時 雍 。',
    'score': 0.027848130092024803,
    'token': 3694,
    'token_str': '氏'},
    {'sequence': '黎 人 於 變 時 雍 。',
    'score': 0.023678112775087357,
    'token': 782,
    'token_str': '人'},
    {'sequence': '黎 生 於 變 時 雍 。',
    'score': 0.018718384206295013,
    'token': 4495,
    'token_str': '生'}]
    
Downloads last month
31
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.