Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,82 @@
|
|
1 |
---
|
|
|
2 |
license: mit
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
language: ja
|
3 |
license: mit
|
4 |
+
datasets:
|
5 |
+
- wikipedia
|
6 |
---
|
7 |
+
|
8 |
+
# nagisa_bert
|
9 |
+
|
10 |
+
A BERT model for [nagisa](https://github.com/taishi-i/nagisa).
|
11 |
+
The model is available in [Transformers](https://github.com/huggingface/transformers) π€.
|
12 |
+
|
13 |
+
## Install
|
14 |
+
|
15 |
+
To use this model, the following python library must be installed.
|
16 |
+
You can install *nagisa_bert* by using the *pip* command.
|
17 |
+
|
18 |
+
Python 3.7+ on Linux or macOS is required.
|
19 |
+
|
20 |
+
|
21 |
+
```bash
|
22 |
+
$ pip install nagisa_bert
|
23 |
+
```
|
24 |
+
|
25 |
+
## Usage
|
26 |
+
|
27 |
+
This model is available in Transformer's pipeline method.
|
28 |
+
|
29 |
+
```python
|
30 |
+
>>> from transformers import pipeline
|
31 |
+
>>> from nagisa_bert import NagisaBertTokenizer
|
32 |
+
|
33 |
+
>>> text = "nagisaγ§[MASK]γ§γγγ’γγ«γ§γ"
|
34 |
+
>>> tokenizer = NagisaBertTokenizer.from_pretrained("taishi-i/nagisa_bert")
|
35 |
+
>>> fill_mask = pipeline("fill-mask", model='taishi-i/nagisa_bert', tokenizer=tokenizer)
|
36 |
+
>>> print(fill_mask(text))
|
37 |
+
[{'score': 0.1385931372642517,
|
38 |
+
'sequence': 'nagisa γ§ δ½Ώη¨ γ§γγ γ’γγ« γ§γ',
|
39 |
+
'token': 8092,
|
40 |
+
'token_str': 'δ½Ώ η¨'},
|
41 |
+
{'score': 0.11947669088840485,
|
42 |
+
'sequence': 'nagisa γ§ ε©η¨ γ§γγ γ’γγ« γ§γ',
|
43 |
+
'token': 8252,
|
44 |
+
'token_str': 'ε© η¨'},
|
45 |
+
{'score': 0.04910655692219734,
|
46 |
+
'sequence': 'nagisa γ§ δ½ζ γ§γγ γ’γγ« γ§γ',
|
47 |
+
'token': 9559,
|
48 |
+
'token_str': 'δ½ ζ'},
|
49 |
+
{'score': 0.03792576864361763,
|
50 |
+
'sequence': 'nagisa γ§ θ³Όε
₯ γ§γγ γ’γγ« γ§γ',
|
51 |
+
'token': 9430,
|
52 |
+
'token_str': 'θ³Ό ε
₯'},
|
53 |
+
{'score': 0.026893319562077522,
|
54 |
+
'sequence': 'nagisa γ§ ε
₯ζ γ§γγ γ’γγ« γ§γ',
|
55 |
+
'token': 11273,
|
56 |
+
'token_str': 'ε
₯ ζ'}]
|
57 |
+
```
|
58 |
+
|
59 |
+
Tokenization and vectorization.
|
60 |
+
|
61 |
+
```python
|
62 |
+
>>> from transformers import BertModel
|
63 |
+
>>> from nagisa_bert import NagisaBertTokenizer
|
64 |
+
|
65 |
+
>>> text = "nagisaγ§[MASK]γ§γγγ’γγ«γ§γ"
|
66 |
+
>>> tokenizer = NagisaBertTokenizer.from_pretrained("taishi-i/nagisa_bert")
|
67 |
+
>>> tokens = tokenizer.tokenize(text)
|
68 |
+
>>> print(tokens)
|
69 |
+
['na', '##g', '##is', '##a', 'γ§', '[MASK]', 'γ§γγ', 'γ’γγ«', 'γ§γ']
|
70 |
+
|
71 |
+
>>> model = BertModel.from_pretrained("taishi-i/nagisa_bert")
|
72 |
+
>>> h = model(**tokenizer(text, return_tensors="pt")).last_hidden_state
|
73 |
+
>>> print(h)
|
74 |
+
tensor([[[-0.2912, -0.6818, -0.4097, ..., 0.0262, -0.3845, 0.5816],
|
75 |
+
[ 0.2504, 0.2143, 0.5809, ..., -0.5428, 1.1805, 1.8701],
|
76 |
+
[ 0.1890, -0.5816, -0.5469, ..., -1.2081, -0.2341, 1.0215],
|
77 |
+
...,
|
78 |
+
[-0.4360, -0.2546, -0.2824, ..., 0.7420, -0.2904, 0.3070],
|
79 |
+
[-0.6598, -0.7607, 0.0034, ..., 0.2982, 0.5126, 1.1403],
|
80 |
+
[-0.2505, -0.6574, -0.0523, ..., 0.9082, 0.5851, 1.2625]]],
|
81 |
+
grad_fn=<NativeLayerNormBackward0>)
|
82 |
+
```
|