Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- stockmark/ner-wikipedia-dataset
|
4 |
+
language:
|
5 |
+
- ja
|
6 |
+
base_model:
|
7 |
+
- sbintuitions/modernbert-ja-30m
|
8 |
+
---
|
9 |
+
# modernBERTでNERにチャレンジ
|
10 |
+
|
11 |
+
## ラベルのマッピング
|
12 |
+
```python
|
13 |
+
label_list = ["O", "B-人名", "I-人名", "B-法人名", "I-法人名", "B-政治的組織名", "I-政治的組織名",
|
14 |
+
"B-その他の組織名", "I-その他の組織名", "B-地名", "I-地名", "B-施設名", "I-施設名",
|
15 |
+
"B-製品名", "I-製品名", "B-イベント名", "I-イベント名"]
|
16 |
+
```
|
17 |
+
## tokenizer
|
18 |
+
以下を参考にしてください。
|
19 |
+
```python
|
20 |
+
model_name = "sbintuitions/modernbert-ja-130m"
|
21 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
22 |
+
tokenizer.backend_tokenizer.pre_tokenizer = Sequence([Split(Regex("[ぁ-ん]"),"isolated"), tokenizer.backend_tokenizer.pre_tokenizer])
|
23 |
+
model = AutoModelForTokenClassification.from_pretrained(model_name, num_labels=17)
|
24 |
+
```
|
25 |
+
|
26 |
+
## 関連記事
|
27 |
+
https://bwgift.hatenadiary.jp/entry/2025/02/20/220323
|
28 |
+
|
29 |
+
## 利用したデータセット、モデルとライセンス
|
30 |
+
- stockmark/ner-wikipedia-dataset(CC-BY-SA-3.0)
|
31 |
+
- ModernBERT-Ja-30M(MIT)
|
32 |
+
|
33 |
+
|