File size: 3,367 Bytes

52c46ef
 
ba1e592
 
 
 
ae1da9d
 
3492bac
 
52c46ef
270b038
ba1e592
 
 
 
 
 
8e3a266
ba1e592
3492bac
 
729447b
 
d7a5b48
 
 
 
 
 
ba1e592
 
4d06d57
ba1e592
 
 
 
 
 
 
 
 
 
 
 
ad2066e
ba1e592
 
ad2066e
ba1e592
 
8e3a266
 
ba1e592

---
license: mit
datasets:
- web_nlg
language:
- en
widget:
- text: "Bourg-la-Reine is located in France and I love this town. I'm from People's Republic of China. [SEP] A Chinese, Loves, Bourg-la-Reine"
- text: "Bucharest is a city in Romania. [SEP] Romania | is located in | Bucharest"

---

# Model card for Inria-CEDAR/FactSpotter-DeBERTaV3-Large

## Model description

This model is related to the paper **"FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation"**. 

Given a triple of format "subject, predicate, object" and a text, the model determines if the triple is present in the text. 

The delimiter can be ", " or " | ".

Different from the paper using ELECTRA, this model is finetuned on DeBERTaV3. 

We also provide Base and Small models

https://huggingface.co./Inria-CEDAR/FactSpotter-DeBERTaV3-Base

https://huggingface.co./Inria-CEDAR/FactSpotter-DeBERTaV3-Small

## How to use the model

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

def sentence_cls_score(input_strings, predicate_cls_model, predicate_cls_tokenizer):
    tokenized_cls_input = predicate_cls_tokenizer(input_strings, truncation=True, padding=True,
                                                  return_token_type_ids=True)
    input_ids = torch.Tensor(tokenized_cls_input['input_ids']).long().to(torch.device("cuda"))
    token_type_ids = torch.Tensor(tokenized_cls_input['token_type_ids']).long().to(torch.device("cuda"))
    attention_mask = torch.Tensor(tokenized_cls_input['attention_mask']).long().to(torch.device("cuda"))
    prev_cls_output = predicate_cls_model(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
    softmax_cls_output = torch.softmax(prev_cls_output.logits, dim=1, )
    return softmax_cls_output


tokenizer = AutoTokenizer.from_pretrained("Inria-CEDAR/FactSpotter-DeBERTaV3-Large")
model = AutoModelForSequenceClassification.from_pretrained("Inria-CEDAR/FactSpotter-DeBERTaV3-Large")
model.to(torch.device("cuda"))

# pairs of texts (as premises) and triples (as hypotheses)
cls_texts = [("the aarhus is the airport of aarhus, denmark", "aarhus airport, city served, aarhus, denmark"),
             ("aarhus airport is 25.0 metres above the sea level", "aarhus airport, elevation above the sea level, 1174")]
cls_scores = sentence_cls_score(cls_texts, model, tokenizer)
# Dimensions: 0-entailment, 1-neutral, 2-contradiction
label_names = ["entailment", "neutral", "contradiction"]
```
## Citation
If the model is useful to you, please cite the paper

```
@inproceedings{zhang:hal-04257838,
  TITLE = {{FactSpotter: Evaluating the Factual Faithfulness of Graph-to-Text Generation}},
  AUTHOR = {Zhang, Kun and Balalau, Oana and Manolescu, Ioana},
  URL = {https://hal.science/hal-04257838},
  BOOKTITLE = {{Findings of EMNLP 2023 - Conference on Empirical Methods in Natural Language Processing}},
  ADDRESS = {Singapore, Singapore},
  YEAR = {2023},
  MONTH = Dec,
  KEYWORDS = {Graph-to-Text Generation ; Factual Faithfulness ; Constrained Text Generation},
  PDF = {https://hal.science/hal-04257838/file/_EMNLP_2023__Evaluating_the_Factual_Faithfulness_of_Graph_to_Text_Generation_Camera.pdf},
  HAL_ID = {hal-04257838},
  HAL_VERSION = {v1},
}
```

## Questions
If you have some questions, please contact through my email [email protected] or [email protected]