Clarifying the competition's metric
Thank you for organizing this competition :) @srishtiy
I have a few questions about the competition's evaluation metric.
I hope the answers can help me to improve my local validation scheme.
- Which embeddings are being used to compute cosine similarity?
- Given that there are 7766 masks in the test set and cosine similarity for each mask is in [-1, 1].
Why are the scores on LB so low? I assumed you build a simple sum over the similarities. Is this assumption wrong?
Thanks for the questions, @gregorlied :) Please find the answers below:
| Which embeddings are being used to compute cosine similarity?
We use the cosine similarity of the predicted words with the groundtruth, details here: https://huggingface.co./spaces/competitions/news-unmasked.
|Given that there are 7766 masks in the test set and cosine similarity for each mask is in [-1, 1].
|Why are the scores on LB so low? I assumed you build a simple sum over the similarities. Is this assumption wrong?
Good question. Cosine similarities are indeed [-1,1]. However, for the final accuracy we take the threshold of 0.5 i.e. if the predicted word and masked word cosine similarity is >=0.5, it's calculated as a correct prediction (this helps give equal weights to words like "death" and "buried" for example). Final accuracy is (total correct predictions)/ ( total words)I will add this details where we mention the metrics. Apologies for missing this detail, it's now on the page.
A pseudo code snipped example would be as below:
# cosine_sim = cosine similarity between the predicted and masked words
# cosine_sim_threshold = 0.5
if cosine_sim >= cosine_sim_threshold:
num_correct += 1
total_pairs += 1
def calculate_accuracy(self):
# Calculate the accuracy of the model based on the cosine similarity threshold
accuracy = num_correct / total_pairs * 100
return accuracy
Hi @srishtiy - since cosine similarity can be used for any kind of embedding - is there a specific one used for this competition? Thanks!
@soarerz
@gregorlied
Apologies for missing the detail. We use en_core_web_lg
from spacy. Link: https://spacy.io/models/en#en_core_web_lg. Please let me know if that helps and if there are more questions.