Fill-Mask
Transformers
PyTorch
Japanese
bert
Inference Endpoints
aken12 commited on
Commit
cdaa964
·
verified ·
1 Parent(s): 1b0df08

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -44
README.md CHANGED
@@ -7,50 +7,6 @@ language:
7
  - ja
8
  ---
9
 
10
- SPLADE-japanese-v2 !!
11
-
12
- Difference between splade-japanese v1 and v2
13
- - initialize [tohoku-nlp/bert-base-japanese-v3](https://huggingface.co/tohoku-nlp/bert-base-japanese-v3)
14
- - knowledge distillation from cross-encoder
15
- - [mMARCO](https://github.com/unicamp-dl/mMARCO) Japanese dataset and use [bclavie/mmarco-japanese-hard-negatives](https://huggingface.co/datasets/bclavie/mmarco-japanese-hard-negatives) as hard negatives
16
-
17
-
18
- you need to install
19
- '''
20
- !pip install fugashi ipadic unidic-lite
21
- '''
22
-
23
- ```python
24
- from transformers import AutoModelForMaskedLM,AutoTokenizer
25
- import torch
26
- import numpy as np
27
-
28
- model = AutoModelForMaskedLM.from_pretrained("aken12/splade-japanesev2-epoch5")
29
- tokenizer = AutoTokenizer.from_pretrained("aken12/splade-japanesev2-epoch5")
30
- vocab_dict = {v: k for k, v in tokenizer.get_vocab().items()}
31
-
32
- def encode_query(query):
33
- query = tokenizer(query, return_tensors="pt")
34
- output = model(**query, return_dict=True).logits
35
- output, _ = torch.max(torch.log(1 + torch.relu(output)) * query['attention_mask'].unsqueeze(-1), dim=1)
36
- return output
37
-
38
- with torch.no_grad():
39
- model_output = encode_query(query="筑波大学では何の研究が行われているか?")
40
-
41
- reps = model_output
42
- idx = torch.nonzero(reps[0], as_tuple=False)
43
-
44
- dict_splade = {}
45
- for i in idx:
46
- token_value = reps[0][i[0]].item()
47
- if token_value > 0:
48
- token = vocab_dict[int(i[0])]
49
- dict_splade[token] = float(token_value)
50
-
51
- sorted_dict_splade = sorted(dict_splade.items(), key=lambda item: item[1], reverse=True)
52
- for token, value in sorted_dict_splade:
53
- print(token, value)
54
 
55
  ```
56
 
 
7
  - ja
8
  ---
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
  ```
12