mmarimon commited on
Commit
1179280
1 Parent(s): 07d0679

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -1
README.md CHANGED
@@ -51,7 +51,7 @@ widget:
51
 
52
  # Catalan BERTa-v2 (roberta-base-ca-v2) finetuned for Named Entity Recognition.
53
 
54
- ## Table of ContentsContents
55
  <details>
56
  <summary>Click to expand</summary>
57
 
@@ -117,6 +117,58 @@ The model was trained with a batch size of 16 and a learning rate of 5e-5 for 5
117
  This model was finetuned maximizing F1 score.
118
 
119
  ### Evaluation results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  We evaluated the _roberta-base-ca-v2-cased-ner_ on the AnCora-Ca-NER test set against standard multilingual and monolingual baselines:
121
 
122
  | Model | AnCora-Ca-NER (F1)|
 
51
 
52
  # Catalan BERTa-v2 (roberta-base-ca-v2) finetuned for Named Entity Recognition.
53
 
54
+ ## Table of Contents
55
  <details>
56
  <summary>Click to expand</summary>
57
 
 
117
  This model was finetuned maximizing F1 score.
118
 
119
  ### Evaluation results
120
+
121
+ Catalan BERTa-v2 (roberta-base-ca-v2) finetuned for Part-of-speech-tagging (POS)
122
+ Table of Contents
123
+ Table of Contents
124
+ Click to expand
125
+ Model description
126
+
127
+ The roberta-base-ca-v2-cased-pos is a Part-of-speech-tagging (POS) model for the Catalan language fine-tuned from the roberta-base-ca-v2 model, a RoBERTa base model pre-trained on a medium-size corpus collected from publicly available corpora and crawlers (check the roberta-base-ca-v2 model card for more details).
128
+ Intended uses and limitations
129
+
130
+ roberta-base-ca-v2-cased-pos model can be used to Part-of-speech-tagging (POS) a text. The model is limited by its training dataset and may not generalize well for all use cases.
131
+ How to use
132
+
133
+ Here is how to use this model:
134
+
135
+ from transformers import pipeline
136
+ from pprint import pprint
137
+
138
+ nlp = pipeline("token-classification", model="projecte-aina/roberta-base-ca-v2-cased-pos")
139
+ example = "Em dic Lluïsa i visc a Santa Maria del Camí."
140
+
141
+ pos_results = nlp(example)
142
+ pprint(pos_results)
143
+
144
+ Limitations and bias
145
+
146
+ At the time of submission, no measures have been taken to estimate the bias embedded in the model. However, we are well aware that our models may be biased since the corpora have been collected using crawling techniques on multiple web sources. We intend to conduct research in these areas in the future, and if completed, this model card will be updated.
147
+ Training
148
+ Training data
149
+
150
+ We used the POS dataset in Catalan from the Universal Dependencies Treebank we refer to Ancora-ca-pos for training and evaluation.
151
+ Training procedure
152
+
153
+ The model was trained with a batch size of 16 and a learning rate of 5e-5 for 5 epochs. We then selected the best checkpoint using the downstream task metric in the corresponding development set and then evaluated it on the test set.
154
+ Evaluation
155
+ Variable and metrics
156
+
157
+ This model was finetuned maximizing F1 score.
158
+ Evaluation results
159
+
160
+ We evaluated the roberta-base-ca-v2-cased-pos on the Ancora-ca-ner test set against standard multilingual and monolingual baselines:
161
+ Model Ancora-ca-pos (F1)
162
+ roberta-base-ca-v2-cased-pos 98.96
163
+ roberta-base-ca-cased-pos 98.96
164
+ mBERT 98.83
165
+ XLM-RoBERTa 98.89
166
+
167
+ For more details, check the fine-tuning and evaluation scripts in the official GitHub repository.
168
+ Additional information
169
+ Author
170
+
171
+
172
  We evaluated the _roberta-base-ca-v2-cased-ner_ on the AnCora-Ca-NER test set against standard multilingual and monolingual baselines:
173
 
174
  | Model | AnCora-Ca-NER (F1)|