metadata
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- feature-extraction
- sentence-similarity
- transformers
license: cc-by-sa-4.0
datasets:
- klue
language:
- ko
๋ณธ ๋ชจ๋ธ์ multi-task loss (MultipleNegativeLoss -> AnglELoss) ๋ก ํ์ต๋์์ต๋๋ค.
Usage (HuggingFace Transformers)
from transformers import AutoTokenizer, AutoModel
import torch
device = torch.device('cuda')
# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}').to(device)
tokenized_data = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
dataloader = DataLoader(tokenized_data, batch_size=batch_size, pin_memory=True)
all_outputs = torch.zeros((len(tokenized_data), self.hidden_size)).to(device)
start_idx = 0
# I used mean-pool method for sentence representation
with torch.no_grad():
for inputs in tqdm(dataloader):
inputs = {k: v.to(device) for k, v in inputs.items()}
representations, _ = self.model(**inputs, return_dict=False)
attention_mask = inputs["attention_mask"]
input_mask_expanded = (attention_mask.unsqueeze(-1).expand(representations.size()).to(representations.dtype))
summed = torch.sum(representations * input_mask_expanded, 1)
sum_mask = input_mask_expanded.sum(1)
sum_mask = torch.clamp(sum_mask, min=1e-9)
end_idx = start_idx + representations.shape[0]
all_outputs[start_idx:end_idx] = (summed / sum_mask)
start_idx = end_idx
Evaluation Results
Organization | Backbone Model | KlueSTS average | KorSTS average |
---|---|---|---|
team-lucid | DeBERTa-base | 54.15 | 29.72 |
monologg | Electra-base | 66.97 | 29.72 |
LMkor | Electra-base | 70.98 | 43.09 |
deliciouscat | DeBERTa-base | - | 67.65 |
BM-K | Roberta-base | 82.93 | 85.77 |
Klue | Roberta-large | 86.71 | 71.70 |
Klue (Hyperparameter searched) | Roberta-large | 86.21 | 75.54 |
๊ธฐ์กด ํ๊ตญ์ด ๋ฌธ์ฅ ์๋ฒ ๋ฉ ๋ชจ๋ธ์ mnli, snli ๋ฑ ์์ด ๋ฐ์ดํฐ์ ์ ๊ธฐ๊ณ๋ฒ์ญํ์ฌ ํ์ต๋ ์ ์ ์ฐธ๊ณ ์ผ์ Klue ๋ฐ์ดํฐ์ ์ผ๋ก ๋์ ํ์ตํด ๋ณด์์ต๋๋ค.
๊ทธ ๊ฒฐ๊ณผ, Klue-Roberta-large ๋ชจ๋ธ ๊ธฐ๋ฐ์ผ๋ก ํ์ตํ์ ๊ฒฝ์ฐ KlueSTS ๋ฐ KorSTS ํ ์คํธ์ ์ ๋ชจ๋์ ๋ํด ์ค์ํ ์ฑ๋ฅ์ ๋ณด์ฌ, ์ข ๋ elaborateํ representation์ ํ์ฑํ๋ ๊ฒ์ผ๋ก ์ฌ๋ฃํ์ต๋๋ค.
๋ค๋ง ํ๊ฐ ์์น๋ ํ์ดํผํ๋ผ๋ฏธํฐ ์ธํ , ์๋ ๋๋ฒ ๋ฑ์ผ๋ก ํฌ๊ฒ ๋ฌ๋ผ์ง ์ ์์ผ๋ฏ๋ก ์ฐธ๊ณ ํ์๊ธธ ๋ฐ๋๋๋ค.
Training
NegativeRank loss -> simcse loss ๋ก ํ์ตํ์ต๋๋ค.