modernBERTでNERにチャレンジ

ラベルのマッピング

label_list = ["O", "B-人名", "I-人名", "B-法人名", "I-法人名", "B-政治的組織名", "I-政治的組織名",
              "B-その他の組織名", "I-その他の組織名", "B-地名", "I-地名", "B-施設名", "I-施設名",
              "B-製品名", "I-製品名", "B-イベント名", "I-イベント名"]

tokenizer

以下を参考にしてください。

model_name = "sbintuitions/modernbert-ja-130m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.backend_tokenizer.pre_tokenizer = Sequence([Split(Regex("[ぁ-ん]"),"isolated"), tokenizer.backend_tokenizer.pre_tokenizer])
model = AutoModelForTokenClassification.from_pretrained(model_name, num_labels=17)

関連記事

https://bwgift.hatenadiary.jp/entry/2025/02/20/220323

利用したデータセット、モデルとライセンス

  • stockmark/ner-wikipedia-dataset(CC-BY-SA-3.0)
  • ModernBERT-Ja-310M(MIT)
Downloads last month
19
Safetensors
Model size
132M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for Chottokun/modernBERT_japanese_ner_wikipedia

Finetuned
(2)
this model

Dataset used to train Chottokun/modernBERT_japanese_ner_wikipedia