Clarifying the competition's metric

#3
by gregorlied - opened

Thank you for organizing this competition :) @srishtiy

I have a few questions about the competition's evaluation metric.
I hope the answers can help me to improve my local validation scheme.

  • Which embeddings are being used to compute cosine similarity?
  • Given that there are 7766 masks in the test set and cosine similarity for each mask is in [-1, 1].
    Why are the scores on LB so low? I assumed you build a simple sum over the similarities. Is this assumption wrong?

Thanks for the questions, @gregorlied :) Please find the answers below:

| Which embeddings are being used to compute cosine similarity?
We use the cosine similarity of the predicted words with the groundtruth, details here: https://huggingface.co./spaces/competitions/news-unmasked.

|Given that there are 7766 masks in the test set and cosine similarity for each mask is in [-1, 1].
|Why are the scores on LB so low? I assumed you build a simple sum over the similarities. Is this assumption wrong?
Good question. Cosine similarities are indeed [-1,1]. However, for the final accuracy we take the threshold of 0.5 i.e. if the predicted word and masked word cosine similarity is >=0.5, it's calculated as a correct prediction (this helps give equal weights to words like "death" and "buried" for example). Final accuracy is (total correct predictions)/ ( total words)I will add this details where we mention the metrics. Apologies for missing this detail, it's now on the page.

A pseudo code snipped example would be as below:

# cosine_sim = cosine similarity between the predicted and masked words
# cosine_sim_threshold = 0.5

if cosine_sim >= cosine_sim_threshold:
    num_correct += 1
total_pairs += 1

def calculate_accuracy(self):
    # Calculate the accuracy of the model based on the cosine similarity threshold
    accuracy = num_correct / total_pairs * 100
    return accuracy

Hi @srishtiy - since cosine similarity can be used for any kind of embedding - is there a specific one used for this competition? Thanks!

@soarerz @gregorlied Apologies for missing the detail. We use en_core_web_lgfrom spacy. Link: https://spacy.io/models/en#en_core_web_lg. Please let me know if that helps and if there are more questions.

Sign up or log in to comment