projecte-aina
/

roberta-base-ca-cased-sts

Text Classification

semantic textual similarity

Catalan Textual Corpus

Inference Endpoints

Model card Files Files and versions Community

ccasimiro commited on Dec 22, 2021

Commit

fa065c4

•

1 Parent(s): c1179b1

Update README.md

Files changed (1) hide show

README.md +30 -4

README.md CHANGED Viewed

@@ -61,6 +61,36 @@ We evaluated the _roberta-base-ca-cased-sts_ on the STS-ca test set against stan
 For more details, check the fine-tuning and evaluation scripts in the official [GitHub repository](https://github.com/projecte-aina/berta).
 ## Citing
 If you use any of these resources (datasets or models) in your work, please cite our latest paper:
 ```bibtex
@@ -84,7 +114,3 @@ If you use any of these resources (datasets or models) in your work, please cite
     pages = "4933--4946",
 }
 ```
-## Funding
-TODO
-## Disclaimer
-TODO

 For more details, check the fine-tuning and evaluation scripts in the official [GitHub repository](https://github.com/projecte-aina/berta).
+## How to use
+To get the correct<sup>1</sup> model's prediction scores with values between 0.0 and 5.0, use the following code:
+```python
+from transformers import pipeline, AutoTokenizer
+from scipy.special import logit
+model = 'projecte-aina/roberta-base-ca-cased-sts'
+tokenizer = AutoTokenizer.from_pretrained(model)
+pipe = pipeline('text-classification', model=model, tokenizer=tokenizer)
+def prepare(sentence_pairs):
+    sentence_pairs_prep = []
+    for s1, s2 in sentence_pairs:
+        sentence_pairs_prep.append(f"{tokenizer.cls_token} {s1}{tokenizer.sep_token}{tokenizer.sep_token} {s2}{tokenizer.sep_token}")
+    return sentence_pairs_prep
+sentence_pairs = [("El llibre va caure per la finestra.", "El llibre va sortir volant."),
+                  ("M'agrades.", "T'estimo."),
+                  ("M'agrada el sol i la calor", "A la Garrotxa plou molt.")]
+predictions = pipe(prepare(sentence_pairs), add_special_tokens=False)
+# convert back to scores to the original 1 and 5 interval
+for prediction in predictions:
+    prediction['score'] = logit(prediction['score'])
+print(predictions)
+```
+1: avoid using the widget scores since they are normalized and do not reflect the original annotation values.
 ## Citing
 If you use any of these resources (datasets or models) in your work, please cite our latest paper:
 ```bibtex
     pages = "4933--4946",
 }
 ```