projecte-aina
/

roberta-base-ca-v2-cased-ner

@@ -51,7 +51,7 @@ widget:
 # Catalan BERTa-v2 (roberta-base-ca-v2) finetuned for Named Entity Recognition.
-## Table of ContentsContents
 <details>
 <summary>Click to expand</summary>
@@ -117,6 +117,58 @@ The model was trained with a batch size of 16 and a learning rate of 5e-5 for 5
 This model was finetuned maximizing F1 score.
 ### Evaluation results
 We evaluated the _roberta-base-ca-v2-cased-ner_ on the AnCora-Ca-NER test set against standard multilingual and monolingual baselines:
 | Model        | AnCora-Ca-NER (F1)|

 # Catalan BERTa-v2 (roberta-base-ca-v2) finetuned for Named Entity Recognition.
+## Table of Contents
 <details>
 <summary>Click to expand</summary>
 This model was finetuned maximizing F1 score.
 ### Evaluation results
+Catalan BERTa-v2 (roberta-base-ca-v2) finetuned for Part-of-speech-tagging (POS)
+Table of Contents
+Table of Contents
+Click to expand
+Model description
+The roberta-base-ca-v2-cased-pos is a Part-of-speech-tagging (POS) model for the Catalan language fine-tuned from the roberta-base-ca-v2 model, a RoBERTa base model pre-trained on a medium-size corpus collected from publicly available corpora and crawlers (check the roberta-base-ca-v2 model card for more details).
+Intended uses and limitations
+roberta-base-ca-v2-cased-pos model can be used to Part-of-speech-tagging (POS) a text. The model is limited by its training dataset and may not generalize well for all use cases.
+How to use
+Here is how to use this model:
+from transformers import pipeline
+from pprint import pprint
+nlp = pipeline("token-classification", model="projecte-aina/roberta-base-ca-v2-cased-pos")
+example = "Em dic Lluïsa i visc a Santa Maria del Camí."
+pos_results = nlp(example)
+pprint(pos_results)
+Limitations and bias
+At the time of submission, no measures have been taken to estimate the bias embedded in the model. However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated.
+Training
+Training data
+We used the POS dataset in Catalan from the Universal Dependencies Treebank we refer to Ancora-ca-pos for training and evaluation.
+Training procedure
+The model was trained with a batch size of 16 and a learning rate of 5e-5 for 5 epochs. We then selected the best checkpoint using the downstream task metric in the corresponding development set and then evaluated it on the test set.
+Evaluation
+Variable and metrics
+This model was finetuned maximizing F1 score.
+Evaluation results
+We evaluated the roberta-base-ca-v2-cased-pos on the Ancora-ca-ner test set against standard multilingual and monolingual baselines:
+Model 	Ancora-ca-pos (F1)
+roberta-base-ca-v2-cased-pos 	98.96
+roberta-base-ca-cased-pos 	98.96
+mBERT 	98.83
+XLM-RoBERTa 	98.89
+For more details, check the fine-tuning and evaluation scripts in the official GitHub repository.
+Additional information
+Author
 We evaluated the _roberta-base-ca-v2-cased-ner_ on the AnCora-Ca-NER test set against standard multilingual and monolingual baselines:
 | Model        | AnCora-Ca-NER (F1)|