sorryhyun's picture
Update README.md
fb5bffc verified
|
raw
history blame
2.79 kB
metadata
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - feature-extraction
  - sentence-similarity
  - transformers
license: cc-by-sa-4.0
datasets:
  - klue
language:
  - ko

๋ณธ ๋ชจ๋ธ์€ multi-task loss (MultipleNegativeLoss -> AnglELoss) ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Usage (HuggingFace Transformers)

from transformers import AutoTokenizer, AutoModel
import torch
device = torch.device('cuda')

# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
model = AutoModel.from_pretrained('{MODEL_NAME}').to(device)

tokenized_data = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
dataloader = DataLoader(tokenized_data, batch_size=batch_size, pin_memory=True)
all_outputs = torch.zeros((len(tokenized_data), self.hidden_size)).to(device)
start_idx = 0

# I used mean-pool method for sentence representation
with torch.no_grad():
  for inputs in tqdm(dataloader):
    inputs = {k: v.to(device) for k, v in inputs.items()}
    representations, _ = self.model(**inputs, return_dict=False)
    attention_mask = inputs["attention_mask"]
    input_mask_expanded = (attention_mask.unsqueeze(-1).expand(representations.size()).to(representations.dtype))
    summed = torch.sum(representations * input_mask_expanded, 1)
    sum_mask = input_mask_expanded.sum(1)
    sum_mask = torch.clamp(sum_mask, min=1e-9)
    end_idx = start_idx + representations.shape[0]
    all_outputs[start_idx:end_idx] = (summed / sum_mask)
    start_idx = end_idx

Evaluation Results

Organization Backbone Model KlueSTS average KorSTS average
team-lucid DeBERTa-base 54.15 29.72
monologg Electra-base 66.97 29.72
LMkor Electra-base 70.98 43.09
deliciouscat DeBERTa-base - 67.65
BM-K Roberta-base 82.93 85.77
Klue Roberta-large 86.71 71.70
Klue (Hyperparameter searched) Roberta-large 86.21 75.54

๊ธฐ์กด ํ•œ๊ตญ์–ด ๋ฌธ์žฅ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์€ mnli, snli ๋“ฑ ์˜์–ด ๋ฐ์ดํ„ฐ์…‹์„ ๊ธฐ๊ณ„๋ฒˆ์—ญํ•˜์—ฌ ํ•™์Šต๋œ ์ ์„ ์ฐธ๊ณ ์‚ผ์•„ Klue ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๋Œ€์‹  ํ•™์Šตํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค.

๊ทธ ๊ฒฐ๊ณผ, Klue-Roberta-large ๋ชจ๋ธ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•™์Šตํ–ˆ์„ ๊ฒฝ์šฐ KlueSTS ๋ฐ KorSTS ํ…Œ์ŠคํŠธ์…‹์— ๋ชจ๋‘์— ๋Œ€ํ•ด ์ค€์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ, ์ข€ ๋” elaborateํ•œ representation์„ ํ˜•์„ฑํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์‚ฌ๋ฃŒํ–ˆ์Šต๋‹ˆ๋‹ค.

๋‹ค๋งŒ ํ‰๊ฐ€ ์ˆ˜์น˜๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์„ธํŒ…, ์‹œ๋“œ ๋„˜๋ฒ„ ๋“ฑ์œผ๋กœ ํฌ๊ฒŒ ๋‹ฌ๋ผ์งˆ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ฐธ๊ณ ํ•˜์‹œ๊ธธ ๋ฐ”๋ž๋‹ˆ๋‹ค.

Training

NegativeRank loss -> simcse loss ๋กœ ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค.