taishi-i commited on
Commit
1859ee2
β€’
1 Parent(s): 1bb58d9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md CHANGED
@@ -1,3 +1,82 @@
1
  ---
 
2
  license: mit
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: ja
3
  license: mit
4
+ datasets:
5
+ - wikipedia
6
  ---
7
+
8
+ # nagisa_bert
9
+
10
+ A BERT model for [nagisa](https://github.com/taishi-i/nagisa).
11
+ The model is available in [Transformers](https://github.com/huggingface/transformers) πŸ€—.
12
+
13
+ ## Install
14
+
15
+ To use this model, the following python library must be installed.
16
+ You can install *nagisa_bert* by using the *pip* command.
17
+
18
+ Python 3.7+ on Linux or macOS is required.
19
+
20
+
21
+ ```bash
22
+ $ pip install nagisa_bert
23
+ ```
24
+
25
+ ## Usage
26
+
27
+ This model is available in Transformer's pipeline method.
28
+
29
+ ```python
30
+ >>> from transformers import pipeline
31
+ >>> from nagisa_bert import NagisaBertTokenizer
32
+
33
+ >>> text = "nagisaで[MASK]できるヒデルです"
34
+ >>> tokenizer = NagisaBertTokenizer.from_pretrained("taishi-i/nagisa_bert")
35
+ >>> fill_mask = pipeline("fill-mask", model='taishi-i/nagisa_bert', tokenizer=tokenizer)
36
+ >>> print(fill_mask(text))
37
+ [{'score': 0.1385931372642517,
38
+ 'sequence': 'nagisa で 使用 できる ヒデル です',
39
+ 'token': 8092,
40
+ 'token_str': 'δ½Ώ 用'},
41
+ {'score': 0.11947669088840485,
42
+ 'sequence': 'nagisa で εˆ©η”¨ できる ヒデル です',
43
+ 'token': 8252,
44
+ 'token_str': '利 用'},
45
+ {'score': 0.04910655692219734,
46
+ 'sequence': 'nagisa で 作成 できる ヒデル です',
47
+ 'token': 9559,
48
+ 'token_str': '作 成'},
49
+ {'score': 0.03792576864361763,
50
+ 'sequence': 'nagisa で θ³Όε…₯ できる ヒデル です',
51
+ 'token': 9430,
52
+ 'token_str': 'θ³Ό ε…₯'},
53
+ {'score': 0.026893319562077522,
54
+ 'sequence': 'nagisa で ε…₯手 できる ヒデル です',
55
+ 'token': 11273,
56
+ 'token_str': 'ε…₯ 手'}]
57
+ ```
58
+
59
+ Tokenization and vectorization.
60
+
61
+ ```python
62
+ >>> from transformers import BertModel
63
+ >>> from nagisa_bert import NagisaBertTokenizer
64
+
65
+ >>> text = "nagisaで[MASK]できるヒデルです"
66
+ >>> tokenizer = NagisaBertTokenizer.from_pretrained("taishi-i/nagisa_bert")
67
+ >>> tokens = tokenizer.tokenize(text)
68
+ >>> print(tokens)
69
+ ['na', '##g', '##is', '##a', 'で', '[MASK]', 'できる', 'ヒデル', 'です']
70
+
71
+ >>> model = BertModel.from_pretrained("taishi-i/nagisa_bert")
72
+ >>> h = model(**tokenizer(text, return_tensors="pt")).last_hidden_state
73
+ >>> print(h)
74
+ tensor([[[-0.2912, -0.6818, -0.4097, ..., 0.0262, -0.3845, 0.5816],
75
+ [ 0.2504, 0.2143, 0.5809, ..., -0.5428, 1.1805, 1.8701],
76
+ [ 0.1890, -0.5816, -0.5469, ..., -1.2081, -0.2341, 1.0215],
77
+ ...,
78
+ [-0.4360, -0.2546, -0.2824, ..., 0.7420, -0.2904, 0.3070],
79
+ [-0.6598, -0.7607, 0.0034, ..., 0.2982, 0.5126, 1.1403],
80
+ [-0.2505, -0.6574, -0.0523, ..., 0.9082, 0.5851, 1.2625]]],
81
+ grad_fn=<NativeLayerNormBackward0>)
82
+ ```