Update README.md
Browse files
README.md
CHANGED
@@ -13,7 +13,7 @@ metrics:
|
|
13 |
|
14 |
This is a XLM-RoBERTa-base fine-tuned model on 5K (premise, hypothesis) sentence pairs from
|
15 |
the ASSIN (Avaliação de Similaridade Semântica e Inferência textual) corpus. Both the original corpus
|
16 |
-
and XLM-RoBERTa-base model can be found here
|
17 |
Unsupervised Cross-Lingual Representation Learning At Scale, ASSIN: Avaliação de Similaridade Semântica e
|
18 |
Inferência Textual, respectivelly. This model is suitable for Portuguese (from Brazil or Portugal).
|
19 |
|
@@ -27,7 +27,7 @@ Inferência Textual, respectivelly. This model is suitable for Portuguese (from
|
|
27 |
|
28 |
- **Developed by:** Giovani Tavares and Felipe Ribas Serras
|
29 |
- **Shared by [optional]:** [More Information Needed]
|
30 |
-
- **Model type:**
|
31 |
- **Language(s) (NLP):** Portuguese
|
32 |
- **License:** [More Information Needed]
|
33 |
- **Finetuned from model [optional]:** [XLM-RoBERTa-base](https://huggingface.co/xlm-roberta-base)
|
@@ -37,7 +37,7 @@ Inferência Textual, respectivelly. This model is suitable for Portuguese (from
|
|
37 |
<!-- Provide the basic links for the model. -->
|
38 |
|
39 |
- **Repository:** [Natural-Portuguese-Language-Inference](https://github.com/giogvn/Natural-Portuguese-Language-Inference)
|
40 |
-
- **Paper [optional]:**
|
41 |
- **Demo [optional]:** [More Information Needed]
|
42 |
|
43 |
## Uses
|
@@ -80,18 +80,23 @@ Use the code below to get started with the model.
|
|
80 |
|
81 |
[More Information Needed]
|
82 |
|
83 |
-
##
|
84 |
|
85 |
-
###
|
86 |
|
87 |
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
88 |
|
|
|
|
|
|
|
|
|
89 |
[More Information Needed]
|
90 |
|
91 |
-
###
|
92 |
|
93 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
94 |
-
|
|
|
95 |
#### Preprocessing [optional]
|
96 |
|
97 |
[More Information Needed]
|
|
|
13 |
|
14 |
This is a XLM-RoBERTa-base fine-tuned model on 5K (premise, hypothesis) sentence pairs from
|
15 |
the ASSIN (Avaliação de Similaridade Semântica e Inferência textual) corpus. Both the original corpus
|
16 |
+
and XLM-RoBERTa-base model can be found here. The original reference papers are:
|
17 |
Unsupervised Cross-Lingual Representation Learning At Scale, ASSIN: Avaliação de Similaridade Semântica e
|
18 |
Inferência Textual, respectivelly. This model is suitable for Portuguese (from Brazil or Portugal).
|
19 |
|
|
|
27 |
|
28 |
- **Developed by:** Giovani Tavares and Felipe Ribas Serras
|
29 |
- **Shared by [optional]:** [More Information Needed]
|
30 |
+
- **Model type:** Transformer-based text classifier
|
31 |
- **Language(s) (NLP):** Portuguese
|
32 |
- **License:** [More Information Needed]
|
33 |
- **Finetuned from model [optional]:** [XLM-RoBERTa-base](https://huggingface.co/xlm-roberta-base)
|
|
|
37 |
<!-- Provide the basic links for the model. -->
|
38 |
|
39 |
- **Repository:** [Natural-Portuguese-Language-Inference](https://github.com/giogvn/Natural-Portuguese-Language-Inference)
|
40 |
+
- **Paper [optional]:** This is an ongoing research. We are currently writing a paper where we describe our experiments.
|
41 |
- **Demo [optional]:** [More Information Needed]
|
42 |
|
43 |
## Uses
|
|
|
80 |
|
81 |
[More Information Needed]
|
82 |
|
83 |
+
## Fine-Tuning Details
|
84 |
|
85 |
+
### Fine-Tuning Data
|
86 |
|
87 |
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
88 |
|
89 |
+
This is a fine tuned version of [XLM-RoBERTa-base](https://huggingface.co/xlm-roberta-base) using the [ASSIN (Avaliação de Similaridade Semântica e Inferência textual)](https://huggingface.co/datasets/assin)
|
90 |
+
[More Information Needed] dataset. [ASSIN](https://huggingface.co/datasets/assin) is a corpus annotated with hypothesis/premise Portuguese sentence pairs suitable for detecting textual entailment, paraphrase or neutral
|
91 |
+
relationship between the members of such pairs. Such corpus has three subsets: *ptbr* (Brazilian Portuguese), *ptpt* (Portuguese Portuguese) and *full* (the union of the latter with the former). The *full* subset has
|
92 |
+
$10k$ sentence pairs equally distributed between *ptbr* and *ptpt* subsets.
|
93 |
[More Information Needed]
|
94 |
|
95 |
+
### Fine-Tuning Procedure
|
96 |
|
97 |
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
98 |
+
The fine-tuning procedure can be summarized in three major subsequent tasks:
|
99 |
+
i
|
100 |
#### Preprocessing [optional]
|
101 |
|
102 |
[More Information Needed]
|