alexandrainst
/

scandi-nli-small

@@ -66,9 +66,28 @@ You can use this model in your scripts as follows:
 ## Performance
-As Danish is, as far as we are aware, the only Scandinavian language with a gold standard NLI dataset, namely the [DanFEVER dataset](https://aclanthology.org/2021.nodalida-main.pdf#page=439), we report evaluation scores on the test split of that dataset.
-We report Matthew's Correlation Coefficient (MCC), macro-average F1-score as well as accuracy.
 | **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
 | :-------- | :------------ | :--------- | :----------- | :----------- |
@@ -80,6 +99,39 @@ We report Matthew's Correlation Coefficient (MCC), macro-average F1-score as wel
 | `alexandrainst/scandi-nli-small` (this) | 47.28% | 48.88% | 73.46% | **22M** |
 ## Training procedure
 It has been fine-tuned on a dataset composed of [DanFEVER](https://aclanthology.org/2021.nodalida-main.pdf#page=439) as well as machine translated versions of [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) and [CommitmentBank](https://doi.org/10.18148/sub/2019.v23i2.601) into all three languages, and machine translated versions of [FEVER](https://aclanthology.org/N18-1074/) and [Adversarial NLI](https://aclanthology.org/2020.acl-main.441/) into Swedish.

 ## Performance
+We evaluate the models in Danish, Swedish and Norwegian Bokmål separately.
+In all cases, we report Matthew's Correlation Coefficient (MCC), macro-average F1-score as well as accuracy.
+### Scandinavian Evaluation
+The Scandinavian scores are the average of the Danish, Swedish and Norwegian scores, which can be found in the sections below.
+| **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
+| :-------- | :------------ | :--------- | :----------- | :----------- |
+| [`alexandrainst/scandi-nli-large`](https://huggingface.co/alexandrainst/scandi-nli-large) | asd | asd | asd | 354M |
+| [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | asd | asd | asd | 279M |
+| [`alexandrainst/scandi-nli-base`](https://huggingface.co/alexandrainst/scandi-nli-base) | asd | asd | asd | 178M |
+| [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 63.94% | 70.41% | 77.23% | 279M |
+| [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | asd | asd | asd | 560M |
+| `alexandrainst/scandi-nli-small` (this) | asd | asd | asd | **22M** |
+### Danish Evaluation
+We use a test split of the [DanFEVER dataset](https://aclanthology.org/2021.nodalida-main.pdf#page=439) to evaluate the Danish performance of the models.
 | **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
 | :-------- | :------------ | :--------- | :----------- | :----------- |
 | `alexandrainst/scandi-nli-small` (this) | 47.28% | 48.88% | 73.46% | **22M** |
+### Swedish Evaluation
+We use the test split of the machine translated version of the [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) dataset to evaluate the Swedish performance of the models.
+We acknowledge that not evaluating on a gold standard dataset is not ideal, but unfortunately we are not aware of any NLI datasets in Swedish.
+| **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
+| :-------- | :------------ | :--------- | :----------- | :----------- |
+| [`alexandrainst/scandi-nli-large`](https://huggingface.co/alexandrainst/scandi-nli-large) | asd | asd | asd | 354M |
+| [`alexandrainst/scandi-nli-base`](https://huggingface.co/alexandrainst/scandi-nli-base) | asd | asd | asd | 178M |
+| [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 73.84% | 82.46% | 82.58% | 279M |
+| [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | 73.32% | 82.15% | 82.08% | 279M |
+| [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | asd | asd | asd | 560M |
+| `alexandrainst/scandi-nli-small` (this) | asd | asd | asd | **22M** |
+### Norwegian Evaluation
+We use the test split of the machine translated version of the [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) dataset to evaluate the Norwegian performance of the models.
+We acknowledge that not evaluating on a gold standard dataset is not ideal, but unfortunately we are not aware of any NLI datasets in Norwegian.
+| **Model** | **MCC** | **Macro-F1** | **Accuracy** | **Number of Parameters** |
+| :-------- | :------------ | :--------- | :----------- | :----------- |
+| [`alexandrainst/scandi-nli-large`](https://huggingface.co/alexandrainst/scandi-nli-large) | asd | asd | asd | 354M |
+| [`MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7) | 65.33% | 76.73% | 76.65% | 279M |
+| [`alexandrainst/scandi-nli-base`](https://huggingface.co/alexandrainst/scandi-nli-base) | asd | asd | asd | 178M |
+| [`NbAiLab/nb-bert-base-mnli`](https://huggingface.co/NbAiLab/nb-bert-base-mnli) | asd | asd | asd | 178M |
+| [`MoritzLaurer/mDeBERTa-v3-base-mnli-xnli`](https://huggingface.co/MoritzLaurer/mDeBERTa-v3-base-mnli-xnli) | 65.18% | 76.76% | 76.77% | 279M |
+| [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli) | asd | asd | asd | 560M |
+| `alexandrainst/scandi-nli-small` (this) | asd | asd | asd| **22M** |
 ## Training procedure
 It has been fine-tuned on a dataset composed of [DanFEVER](https://aclanthology.org/2021.nodalida-main.pdf#page=439) as well as machine translated versions of [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/) and [CommitmentBank](https://doi.org/10.18148/sub/2019.v23i2.601) into all three languages, and machine translated versions of [FEVER](https://aclanthology.org/N18-1074/) and [Adversarial NLI](https://aclanthology.org/2020.acl-main.441/) into Swedish.