File size: 2,788 Bytes
79fe57b 148fa52 04a6a48 79fe57b 148fa52 04a6a48 148fa52 fb5bffc 148fa52 fb5bffc 148fa52 fb5bffc 148fa52 04a6a48 148fa52 fb5bffc 04a6a48 fb5bffc 148fa52 04a6a48 148fa52 04a6a48 148fa52 04a6a48 148fa52 04a6a48 148fa52 04a6a48 148fa52 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
---
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
license: cc-by-sa-4.0
datasets:
- klue
language:
- ko
---
๋ณธ ๋ชจ๋ธ์ multi-task loss (MultipleNegativeLoss -> AnglELoss) ๋ก ํ์ต๋์์ต๋๋ค.
## Usage (HuggingFace Transformers)
```python
from transformers import AutoTokenizer, AutoModel
import torch
device = torch.device('cuda')
# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}').to(device)
tokenized_data = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
dataloader = DataLoader(tokenized_data, batch_size=batch_size, pin_memory=True)
all_outputs = torch.zeros((len(tokenized_data), self.hidden_size)).to(device)
start_idx = 0
# I used mean-pool method for sentence representation
with torch.no_grad():
for inputs in tqdm(dataloader):
inputs = {k: v.to(device) for k, v in inputs.items()}
representations, _ = self.model(**inputs, return_dict=False)
attention_mask = inputs["attention_mask"]
input_mask_expanded = (attention_mask.unsqueeze(-1).expand(representations.size()).to(representations.dtype))
summed = torch.sum(representations * input_mask_expanded, 1)
sum_mask = input_mask_expanded.sum(1)
sum_mask = torch.clamp(sum_mask, min=1e-9)
end_idx = start_idx + representations.shape[0]
all_outputs[start_idx:end_idx] = (summed / sum_mask)
start_idx = end_idx
```
## Evaluation Results
| Organization | Backbone Model | KlueSTS average | KorSTS average |
| -------- | ------- | ------- | ------- |
| team-lucid | DeBERTa-base | 54.15 | 29.72 |
| monologg | Electra-base | 66.97 | 29.72 |
| LMkor | Electra-base | 70.98 | 43.09 |
| deliciouscat | DeBERTa-base | - | 67.65 |
| BM-K | Roberta-base | 82.93 | **85.77** |
| Klue | Roberta-large | **86.71** | 71.70 |
| Klue (Hyperparameter searched) | Roberta-large | 86.21 | 75.54 |
๊ธฐ์กด ํ๊ตญ์ด ๋ฌธ์ฅ ์๋ฒ ๋ฉ ๋ชจ๋ธ์ mnli, snli ๋ฑ ์์ด ๋ฐ์ดํฐ์
์ ๊ธฐ๊ณ๋ฒ์ญํ์ฌ ํ์ต๋ ์ ์ ์ฐธ๊ณ ์ผ์ Klue ๋ฐ์ดํฐ์
์ผ๋ก ๋์ ํ์ตํด ๋ณด์์ต๋๋ค.
๊ทธ ๊ฒฐ๊ณผ, Klue-Roberta-large ๋ชจ๋ธ ๊ธฐ๋ฐ์ผ๋ก ํ์ตํ์ ๊ฒฝ์ฐ KlueSTS ๋ฐ KorSTS ํ
์คํธ์
์ ๋ชจ๋์ ๋ํด ์ค์ํ ์ฑ๋ฅ์ ๋ณด์ฌ, ์ข ๋ elaborateํ representation์ ํ์ฑํ๋ ๊ฒ์ผ๋ก ์ฌ๋ฃํ์ต๋๋ค.
๋ค๋ง ํ๊ฐ ์์น๋ ํ์ดํผํ๋ผ๋ฏธํฐ ์ธํ
, ์๋ ๋๋ฒ ๋ฑ์ผ๋ก ํฌ๊ฒ ๋ฌ๋ผ์ง ์ ์์ผ๋ฏ๋ก ์ฐธ๊ณ ํ์๊ธธ ๋ฐ๋๋๋ค.
## Training
NegativeRank loss -> simcse loss ๋ก ํ์ตํ์ต๋๋ค.
|