modernBERTでNERにチャレンジ
ラベルのマッピング
label_list = ["O", "B-人名", "I-人名", "B-法人名", "I-法人名", "B-政治的組織名", "I-政治的組織名",
"B-その他の組織名", "I-その他の組織名", "B-地名", "I-地名", "B-施設名", "I-施設名",
"B-製品名", "I-製品名", "B-イベント名", "I-イベント名"]
tokenizer
以下を参考にしてください。
model_name = "sbintuitions/modernbert-ja-130m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.backend_tokenizer.pre_tokenizer = Sequence([Split(Regex("[ぁ-ん]"),"isolated"), tokenizer.backend_tokenizer.pre_tokenizer])
model = AutoModelForTokenClassification.from_pretrained(model_name, num_labels=17)
関連記事
https://bwgift.hatenadiary.jp/entry/2025/02/20/220323
利用したデータセット、モデルとライセンス
- stockmark/ner-wikipedia-dataset(CC-BY-SA-3.0)
- ModernBERT-Ja-30M(MIT)
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for Chottokun/modernBERT_japanese_30m_ner_wikipedia
Base model
sbintuitions/modernbert-ja-30m