giotvr
/

xlm_roberta_base_assin_fine_tuned

@@ -13,7 +13,7 @@ metrics:
 This is a XLM-RoBERTa-base fine-tuned model on 5K (premise, hypothesis) sentence pairs from
 the ASSIN (Avaliação de Similaridade Semântica e Inferência textual) corpus. Both the original corpus
-and XLM-RoBERTa-base model can be found here and the original reference papers are:
 Unsupervised Cross-Lingual Representation Learning At Scale, ASSIN: Avaliação de Similaridade Semântica e
 Inferência Textual, respectivelly. This model is suitable for Portuguese (from Brazil or Portugal).
@@ -27,7 +27,7 @@ Inferência Textual, respectivelly. This model is suitable for Portuguese (from
 - **Developed by:** Giovani Tavares and Felipe Ribas Serras
 - **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
 - **Language(s) (NLP):** Portuguese
 - **License:** [More Information Needed]
 - **Finetuned from model [optional]:** [XLM-RoBERTa-base](https://huggingface.co/xlm-roberta-base)
@@ -37,7 +37,7 @@ Inferência Textual, respectivelly. This model is suitable for Portuguese (from
 <!-- Provide the basic links for the model. -->
 - **Repository:** [Natural-Portuguese-Language-Inference](https://github.com/giogvn/Natural-Portuguese-Language-Inference)
-- **Paper [optional]:** [More Information Needed]
 - **Demo [optional]:** [More Information Needed]
 ## Uses
@@ -80,18 +80,23 @@ Use the code below to get started with the model.
 [More Information Needed]
-## Training Details
-### Training Data
 <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 [More Information Needed]
-### Training Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 #### Preprocessing [optional]
 [More Information Needed]

 This is a XLM-RoBERTa-base fine-tuned model on 5K (premise, hypothesis) sentence pairs from
 the ASSIN (Avaliação de Similaridade Semântica e Inferência textual) corpus. Both the original corpus
+and XLM-RoBERTa-base model can be found here. The original reference papers are:
 Unsupervised Cross-Lingual Representation Learning At Scale, ASSIN: Avaliação de Similaridade Semântica e
 Inferência Textual, respectivelly. This model is suitable for Portuguese (from Brazil or Portugal).
 - **Developed by:** Giovani Tavares and Felipe Ribas Serras
 - **Shared by [optional]:** [More Information Needed]
+- **Model type:** Transformer-based text classifier
 - **Language(s) (NLP):** Portuguese
 - **License:** [More Information Needed]
 - **Finetuned from model [optional]:** [XLM-RoBERTa-base](https://huggingface.co/xlm-roberta-base)
 <!-- Provide the basic links for the model. -->
 - **Repository:** [Natural-Portuguese-Language-Inference](https://github.com/giogvn/Natural-Portuguese-Language-Inference)
+- **Paper [optional]:** This is an ongoing research. We are currently writing a paper where we describe our experiments.
 - **Demo [optional]:** [More Information Needed]
 ## Uses
 [More Information Needed]
+## Fine-Tuning Details
+### Fine-Tuning Data
 <!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+This is a fine tuned version of [XLM-RoBERTa-base](https://huggingface.co/xlm-roberta-base) using the [ASSIN (Avaliação de Similaridade Semântica e Inferência textual)](https://huggingface.co/datasets/assin)
+[More Information Needed] dataset. [ASSIN](https://huggingface.co/datasets/assin) is a corpus annotated with hypothesis/premise Portuguese sentence pairs suitable for detecting textual entailment, paraphrase or neutral
+relationship between the members of such pairs. Such corpus has three subsets: *ptbr* (Brazilian Portuguese), *ptpt* (Portuguese Portuguese) and *full* (the union of the latter with the former). The *full* subset has
+$10k$ sentence pairs equally distributed between *ptbr*  and *ptpt* subsets.
 [More Information Needed]
+### Fine-Tuning Procedure
 <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+The fine-tuning procedure can be summarized in three major subsequent tasks:
+i
 #### Preprocessing [optional]
 [More Information Needed]