---
datasets:
- assin
language:
- pt
metrics:
- accuracy
pipeline_tag: text-classification
tags:
- nli
---
# Model Card for Model ID
This is a **[XLM-RoBERTa-base](https://huggingface.co./xlm-roberta-base) fine-tuned model** on 5K (premise, hypothesis) sentence pairs from
the **ASSIN (Avaliação de Similaridade Semântica e Inferência textual)** corpus. The original reference papers are:
[Unsupervised Cross-Lingual Representation Learning At Scale](https://arxiv.org/pdf/1911.02116), [ASSIN: Avaliação de Similaridade Semântica e Inferência Textual](https://huggingface.co./datasets/assin), respectivelly. This model is suitable for Portuguese (from Brazil or Portugal).
## Model Details
### Model Description
- **Developed by:** Giovani Tavares and Felipe Ribas Serras
- **Oriented By:** Renata Wassermann, Felipe Ribas Serras and Marcelo Finger
- **Model type:** Transformer-based text classifier
- **Language(s) (NLP):** Portuguese
- **License:** mit
- **Finetuned from model** [XLM-RoBERTa-base](https://huggingface.co./xlm-roberta-base)
### Model Sources
- **Repository:** [Natural-Portuguese-Language-Inference](https://github.com/giogvn/Natural-Portuguese-Language-Inference)
- **Paper:** This is an ongoing research. We are currently writing a paper where we fully describe our experiments.
## Uses
### Direct Use
This fine-tuned version of [XLM-RoBERTa-base](https://huggingface.co./xlm-roberta-base) performs Natural
Language Inference (NLI), which is a text classification task. Therefore, it classifies pairs of sentences in the form (premise, hypothesis) into one of the following classes ENTAILMENT, PARAPHRASE or NONE. Salvatore's definition [1] for ENTAILEMENT is assumed to be the same as the one found in [ASSIN](https://huggingface.co./datasets/assin)'s labels in which this model was trained on.
PARAPHRASE and NONE are not defined in [1].Therefore, it is assumed that in this model's training set, given a pair of sentences (paraphase, hypothesis), hypothesis is a PARAPHRASE of premise if premise is an ENTAILMENT of hypothesis *and* vice-versa. If (premise, hypothesis) don't have an ENTAILMENT or PARAPHARSE relationship, (premise, hypothesis) is classified as NONE.
## Demo
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
model_path = "giotvr/portuguese-nli-3-labels"
premise = "As mudanças climáticas são uma ameaça séria para a biodiversidade do planeta."
hypothesis ="A biodiversidade do planeta é seriamente ameaçada pelas mudanças climáticas."
tokenizer = XLMRobertaTokenizer.from_pretrained(model_path, use_auth_token=True)
input_pair = tokenizer(premise, hypothesis, return_tensors="pt",padding=True, truncation=True)
model = AutoModelForSequenceClassification.from_pretrained(model_path, use_auth_token=True)
with torch.no_grad():
logits = model(**input_pair).logits
probs = torch.nn.functional.softmax(logits, dim=-1)
probs, sorted_indices = torch.sort(probs, descending=True)
for i, score in enumerate(probs[0]):
print(f"Class {sorted_indices[0][i]}: {score.item():.4f}")
```
### Recommendations
This model should be used for scientific purposes only. It was not tested for production environments.
## Fine-Tuning Details
### Fine-Tuning Data
---
- **Train Dataset**: [ASSIN](https://huggingface.co./datasets/assin)
- **Evaluation Dataset used for Hyperparameter Tuning:** [ASSIN](https://huggingface.co./datasets/assin)'s validation split
- **Test Datasets:**
- [ASSIN](https://huggingface.co./datasets/assin)'s test splits
- [ASSIN2](https://huggingface.co./datasets/assin2)'s test splits
---
This is a fine tuned version of [XLM-RoBERTa-base](https://huggingface.co./xlm-roberta-base) using the [ASSIN (Avaliação de Similaridade Semântica e Inferência textual)](https://huggingface.co./datasets/assin) dataset. [ASSIN](https://huggingface.co./datasets/assin) is a corpus annotated with hypothesis/premise Portuguese sentence pairs suitable for detecting textual entailment, paraphrase or neutral
relationship between the members of such pairs. Such corpus has three subsets: *ptbr* (Brazilian Portuguese), *ptpt* (Portuguese Portuguese) and *full* (the union of the latter with the former). The *full* subset has
10k sentence pairs equally distributed between *ptbr* and *ptpt* subsets.
### Fine-Tuning Procedure
The model's fine-tuning procedure can be summarized in three major subsequent tasks: