File size: 2,788 Bytes
79fe57b
148fa52
 
 
 
 
 
 
04a6a48
 
 
 
 
79fe57b
148fa52
04a6a48
148fa52
 
 
 
 
 
fb5bffc
148fa52
 
 
 
 
 
fb5bffc
148fa52
fb5bffc
 
 
 
148fa52
04a6a48
148fa52
fb5bffc
04a6a48
 
 
 
 
 
 
fb5bffc
 
 
148fa52
 
 
 
 
 
04a6a48
 
 
 
 
 
 
 
 
148fa52
04a6a48
148fa52
04a6a48
148fa52
04a6a48
148fa52
04a6a48
 
148fa52
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
license: cc-by-sa-4.0
datasets:
- klue
language:
- ko
---

๋ณธ ๋ชจ๋ธ์€ multi-task loss (MultipleNegativeLoss -> AnglELoss) ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

## Usage (HuggingFace Transformers)

```python
from transformers import AutoTokenizer, AutoModel
import torch
device = torch.device('cuda')

# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}').to(device)

tokenized_data = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
dataloader = DataLoader(tokenized_data, batch_size=batch_size, pin_memory=True)
all_outputs = torch.zeros((len(tokenized_data), self.hidden_size)).to(device)
start_idx = 0

# I used mean-pool method for sentence representation
with torch.no_grad():
  for inputs in tqdm(dataloader):
    inputs = {k: v.to(device) for k, v in inputs.items()}
    representations, _ = self.model(**inputs, return_dict=False)
    attention_mask = inputs["attention_mask"]
    input_mask_expanded = (attention_mask.unsqueeze(-1).expand(representations.size()).to(representations.dtype))
    summed = torch.sum(representations * input_mask_expanded, 1)
    sum_mask = input_mask_expanded.sum(1)
    sum_mask = torch.clamp(sum_mask, min=1e-9)
    end_idx = start_idx + representations.shape[0]
    all_outputs[start_idx:end_idx] = (summed / sum_mask)
    start_idx = end_idx

```


## Evaluation Results

| Organization | Backbone Model | KlueSTS average | KorSTS average |
| -------- | ------- | ------- | ------- |
| team-lucid | DeBERTa-base | 54.15 | 29.72 |
| monologg | Electra-base | 66.97 | 29.72 |
| LMkor | Electra-base | 70.98 | 43.09 |
| deliciouscat | DeBERTa-base | - | 67.65 |
| BM-K    | Roberta-base | 82.93 | **85.77** |
| Klue    | Roberta-large | **86.71** | 71.70 |
| Klue (Hyperparameter searched) | Roberta-large | 86.21 | 75.54 |

๊ธฐ์กด ํ•œ๊ตญ์–ด ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์€ mnli, snli ๋“ฑ ์˜์–ด ๋ฐ์ดํ„ฐ์…‹์„ ๊ธฐ๊ณ„๋ฒˆ์—ญํ•˜์—ฌ ํ•™์Šต๋œ ์ ์„ ์ฐธ๊ณ ์‚ผ์•„ Klue ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋Œ€์‹  ํ•™์Šตํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค.

๊ทธ ๊ฒฐ๊ณผ, Klue-Roberta-large ๋ชจ๋ธ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šตํ–ˆ์„ ๊ฒฝ์šฐ KlueSTS ๋ฐ KorSTS ํ…Œ์ŠคํŠธ์…‹์— ๋ชจ๋‘์— ๋Œ€ํ•ด ์ค€์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ, ์ข€ ๋” elaborateํ•œ representation์„ ํ˜•์„ฑํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์‚ฌ๋ฃŒํ–ˆ์Šต๋‹ˆ๋‹ค.

๋‹ค๋งŒ ํ‰๊ฐ€ ์ˆ˜์น˜๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์„ธํŒ…, ์‹œ๋“œ ๋„˜๋ฒ„ ๋“ฑ์œผ๋กœ ํฌ๊ฒŒ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ฐธ๊ณ ํ•˜์‹œ๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค.

## Training
NegativeRank loss -> simcse loss ๋กœ ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค.