imvladikon commited on
Commit
a40ab9f
1 Parent(s): 83e24e5

iahlt/span-marker-alephbert-small-nemo-mt-he

Browse files
README.md CHANGED
@@ -1,93 +1,240 @@
1
  ---
2
- language:
3
- - he
4
  tags:
5
- - language model
6
- pipeline_tag: feature-extraction
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
- ## AlephBertGimmel
10
- Modern Hebrew pretrained BERT model with a 128K token vocabulary.
11
-
12
-
13
- [Checkpoint](https://github.com/Dicta-Israel-Center-for-Text-Analysis/alephbertgimmel/tree/main/alephbertgimmel-small/ckpt_29400--Max128Seq) of the alephbertgimmel-small-128 from [alephbertgimmel](https://github.com/Dicta-Israel-Center-for-Text-Analysis/alephbertgimmel)
14
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ```python
17
- from transformers import AutoTokenizer, AutoModelForMaskedLM
18
-
19
-
20
- import torch
21
- from transformers import AutoModelForMaskedLM, AutoTokenizer
22
-
23
- model = AutoModelForMaskedLM.from_pretrained("imvladikon/alephbertgimmel-small-128")
24
- tokenizer = AutoTokenizer.from_pretrained("imvladikon/alephbertgimmel-small-128")
25
-
26
- text = "{} היא מטרופולין המהווה את מרכז הכלכלה"
27
-
28
- input = tokenizer.encode(text.format("[MASK]"), return_tensors="pt")
29
- mask_token_index = torch.where(input == tokenizer.mask_token_id)[1]
30
-
31
- token_logits = model(input).logits
32
- mask_token_logits = token_logits[0, mask_token_index, :]
33
- top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()
34
-
35
- for token in top_5_tokens:
36
- print(text.format(tokenizer.decode([token])))
37
 
38
- # ישראל היא מטרופולין המהווה את מרכז הכלכלה
39
- # ירושלים היא מטרופולין המהווה את מרכז הכלכלה
40
- # חיפה היא מטרופולין המהווה את מרכז הכלכלה
41
- # אילת היא מטרופולין המהווה את מרכז הכלכלה
42
- # אשדוד היא מטרופולין המהווה את מרכז הכלכלה
43
  ```
44
 
45
- ```python
46
- def ppl_naive(text, model, tokenizer):
47
- input = tokenizer.encode(text, return_tensors="pt")
48
- loss = model(input, labels=input)[0]
49
- return torch.exp(loss).item()
50
 
51
- text = """{} היא עיר הבירה של מדינת ישראל, והעיר הגדולה ביותר בישראל בגודל האוכלוסייה"""
52
 
53
- for word in ["חיפה", "ירושלים", "תל אביב"]:
54
- print(ppl_naive(text.format(word), model, tokenizer))
55
-
56
- # 9.825098991394043
57
- # 10.594215393066406
58
- # 9.536449432373047
59
-
60
- # I'd expect that for "ירושלים" should be the smallest value, but...
61
-
62
- @torch.inference_mode()
63
- def ppl_pseudo(text, model, tokenizer, ignore_idx=-100):
64
- input = tokenizer.encode(text, return_tensors='pt')
65
- mask = torch.ones(input.size(-1) - 1).diag(1)[:-2]
66
- repeat_input = input.repeat(input.size(-1) - 2, 1)
67
- input = repeat_input.masked_fill(mask == 1, tokenizer.mask_token_id)
68
- labels = repeat_input.masked_fill(input != tokenizer.mask_token_id, ignore_idx)
69
- loss = model(input, labels=labels)[0]
70
- return torch.exp(loss).item()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
 
 
72
 
73
- for word in ["חיפה", "ירושלים", "תל אביב"]:
74
- print(ppl_pseudo(text.format(word), model, tokenizer))
75
- # 4.346900939941406
76
- # 3.292382001876831
77
- # 2.732590913772583
78
- ```
79
 
80
- When using AlephBertGimmel, please reference:
 
81
 
82
- ```bibtex
 
83
 
84
- @misc{guetta2022large,
85
- title={Large Pre-Trained Models with Extra-Large Vocabularies: A Contrastive Analysis of Hebrew BERT Models and a New One to Outperform Them All},
86
- author={Eylon Guetta and Avi Shmidman and Shaltiel Shmidman and Cheyn Shmuel Shmidman and Joshua Guedalia and Moshe Koppel and Dan Bareket and Amit Seker and Reut Tsarfaty},
87
- year={2022},
88
- eprint={2211.15199},
89
- archivePrefix={arXiv},
90
- primaryClass={cs.CL}
91
- }
92
 
93
- ```
 
 
1
  ---
2
+ library_name: span-marker
 
3
  tags:
4
+ - span-marker
5
+ - token-classification
6
+ - ner
7
+ - named-entity-recognition
8
+ - generated_from_span_marker_trainer
9
+ datasets:
10
+ - imvladikon/nemo_corpus
11
+ metrics:
12
+ - precision
13
+ - recall
14
+ - f1
15
+ widget:
16
+ - text: אחר כך הצטרף ל דאלאס מאווריקס מ ה אנ.בי.איי ו חזר לשחק ב אירופה ב ספרד ב מדי
17
+ קאחה בילבאו ו חירונה
18
+ - text: ב קיץ 1982 ניסה טל ברודי (אז עוזר ה מאמן) להחתימו, אבל בריאנט, ש סבתו יהודיה,
19
+ חתם אז ב פורד קאנטו ו זכה עמ היא ב אותה עונה ב גביע אירופה ל אלופות.
20
+ - text: יו"ר ועדת ה נוער נתן סלובטיק אמר ש ה שחקנים של אנחנו לא משתלבים ב אירופה.
21
+ - text: ב ה סגל ש יתכנס מחר אחר ה צהריים ל מחנה אימונים ב שפיים 17 שחקנים, כולל מוזמן
22
+ חדש שירן אדירי מ מכבי תל אביב.
23
+ - text: 'תוצאות אחרות: טורינו 2 (מורלו עצמי, מולר) לצה 0; קאליארי 0 לאציו 1 (פסטה,
24
+ שער עצמי); פיורנטינה 2 (נאפי, פאציונה) גנואה 2 (אורלאנדו, שקוראווי).'
25
+ pipeline_tag: token-classification
26
+ model-index:
27
+ - name: SpanMarker
28
+ results:
29
+ - task:
30
+ type: token-classification
31
+ name: Named Entity Recognition
32
+ dataset:
33
+ name: Unknown
34
+ type: imvladikon/nemo_corpus
35
+ split: test
36
+ metrics:
37
+ - type: f1
38
+ value: 0.7338129496402878
39
+ name: F1
40
+ - type: precision
41
+ value: 0.7577142857142857
42
+ name: Precision
43
+ - type: recall
44
+ value: 0.7113733905579399
45
+ name: Recall
46
  ---
47
 
48
+ # SpanMarker
49
+
50
+ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [imvladikon/nemo_corpus](https://huggingface.co/datasets/imvladikon/nemo_corpus) dataset that can be used for Named Entity Recognition.
51
+
52
+ ## Model Details
53
+
54
+ ### Model Description
55
+ - **Model Type:** SpanMarker
56
+ <!-- - **Encoder:** [Unknown](https://huggingface.co/unknown) -->
57
+ - **Maximum Sequence Length:** 512 tokens
58
+ - **Maximum Entity Length:** 100 words
59
+ - **Training Dataset:** [imvladikon/nemo_corpus](https://huggingface.co/datasets/imvladikon/nemo_corpus)
60
+ <!-- - **Language:** Unknown -->
61
+ <!-- - **License:** Unknown -->
62
+
63
+ ### Model Sources
64
+
65
+ - **Repository:** [SpanMarker on GitHub](https://github.com/tomaarsen/SpanMarkerNER)
66
+ - **Thesis:** [SpanMarker For Named Entity Recognition](https://raw.githubusercontent.com/tomaarsen/SpanMarkerNER/main/thesis.pdf)
67
+
68
+ ### Model Labels
69
+ | Label | Examples |
70
+ |:------|:------------------------------------------------|
71
+ | ANG | "יידיש", "גרמנית", "אנגלית" |
72
+ | DUC | "דינמיט", "סובארו", "מרצדס" |
73
+ | EVE | "מצדה", "הצהרת בלפור", "ה שואה" |
74
+ | FAC | "ברזילי", "כלא עזה", "תל - ה שומר" |
75
+ | GPE | "ה שטחים", "שפרעם", "רצועת עזה" |
76
+ | LOC | "שייח רדואן", "גיבאליה", "חאן יונס" |
77
+ | ORG | "כך", "ה ארץ", "מרחב ה גליל" |
78
+ | PER | "רמי רהב", "נימר חוסיין", "איברהים נימר חוסיין" |
79
+ | WOA | "קיטש ו מוות", "קדיש", "ה ארץ" |
80
+
81
+ ## Evaluation
82
+
83
+ ### Metrics
84
+ | Label | Precision | Recall | F1 |
85
+ |:--------|:----------|:-------|:-------|
86
+ | **all** | 0.7577 | 0.7114 | 0.7338 |
87
+ | ANG | 0.0 | 0.0 | 0.0 |
88
+ | DUC | 0.0 | 0.0 | 0.0 |
89
+ | FAC | 0.0 | 0.0 | 0.0 |
90
+ | GPE | 0.7085 | 0.8103 | 0.7560 |
91
+ | LOC | 0.5714 | 0.1951 | 0.2909 |
92
+ | ORG | 0.7460 | 0.6912 | 0.7176 |
93
+ | PER | 0.8301 | 0.8052 | 0.8175 |
94
+ | WOA | 0.0 | 0.0 | 0.0 |
95
+
96
+ ## Uses
97
+
98
+ ### Direct Use for Inference
99
 
100
  ```python
101
+ from span_marker import SpanMarkerModel
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
 
103
+ # Download from the 🤗 Hub
104
+ model = SpanMarkerModel.from_pretrained("span_marker_model_id")
105
+ # Run inference
106
+ entities = model.predict("יו\"ר ועדת ה נוער נתן סלובטיק אמר ש ה שחקנים של אנחנו לא משתלבים ב אירופה.")
 
107
  ```
108
 
109
+ ### Downstream Use
110
+ You can finetune this model on your own dataset.
 
 
 
111
 
112
+ <details><summary>Click to expand</summary>
113
 
114
+ ```python
115
+ from span_marker import SpanMarkerModel, Trainer
116
+
117
+ # Download from the 🤗 Hub
118
+ model = SpanMarkerModel.from_pretrained("span_marker_model_id")
119
+
120
+ # Specify a Dataset with "tokens" and "ner_tag" columns
121
+ dataset = load_dataset("conll2003") # For example CoNLL2003
122
+
123
+ # Initialize a Trainer using the pretrained model & dataset
124
+ trainer = Trainer(
125
+ model=model,
126
+ train_dataset=dataset["train"],
127
+ eval_dataset=dataset["validation"],
128
+ )
129
+ trainer.train()
130
+ trainer.save_model("span_marker_model_id-finetuned")
131
+ ```
132
+ </details>
133
+
134
+ <!--
135
+ ### Out-of-Scope Use
136
+
137
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
138
+ -->
139
+
140
+ <!--
141
+ ## Bias, Risks and Limitations
142
+
143
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
144
+ -->
145
+
146
+ <!--
147
+ ### Recommendations
148
+
149
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
150
+ -->
151
+
152
+ ## Training Details
153
+
154
+ ### Training Set Metrics
155
+ | Training set | Min | Median | Max |
156
+ |:----------------------|:----|:--------|:----|
157
+ | Sentence length | 1 | 25.4427 | 117 |
158
+ | Entities per sentence | 0 | 1.2472 | 20 |
159
+
160
+ ### Training Hyperparameters
161
+ - learning_rate: 1e-05
162
+ - train_batch_size: 2
163
+ - eval_batch_size: 2
164
+ - seed: 42
165
+ - gradient_accumulation_steps: 2
166
+ - total_train_batch_size: 4
167
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
168
+ - lr_scheduler_type: linear
169
+ - lr_scheduler_warmup_ratio: 0.1
170
+ - num_epochs: 4
171
+ - mixed_precision_training: Native AMP
172
+
173
+ ### Training Results
174
+ | Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy |
175
+ |:------:|:----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:|
176
+ | 0.4070 | 1000 | 0.0352 | 0.0 | 0.0 | 0.0 | 0.8980 |
177
+ | 0.8140 | 2000 | 0.0327 | 0.0 | 0.0 | 0.0 | 0.8980 |
178
+ | 1.2210 | 3000 | 0.0224 | 0.0 | 0.0 | 0.0 | 0.8980 |
179
+ | 1.6280 | 4000 | 0.0149 | 0.5874 | 0.2200 | 0.3201 | 0.9134 |
180
+ | 2.0350 | 5000 | 0.0137 | 0.55 | 0.3895 | 0.4560 | 0.9248 |
181
+ | 2.4420 | 6000 | 0.0113 | 0.6204 | 0.4313 | 0.5089 | 0.9298 |
182
+ | 2.8490 | 7000 | 0.0121 | 0.5733 | 0.5075 | 0.5384 | 0.9310 |
183
+ | 3.2560 | 8000 | 0.0115 | 0.5782 | 0.5236 | 0.5495 | 0.9334 |
184
+ | 3.6630 | 9000 | 0.0108 | 0.6100 | 0.5354 | 0.5703 | 0.9359 |
185
+ | 0.4070 | 1000 | 0.0103 | 0.6321 | 0.5880 | 0.6092 | 0.9381 |
186
+ | 0.8140 | 2000 | 0.0088 | 0.6968 | 0.6288 | 0.6610 | 0.9471 |
187
+ | 1.2210 | 3000 | 0.0091 | 0.6790 | 0.6695 | 0.6742 | 0.9484 |
188
+ | 1.6280 | 4000 | 0.0086 | 0.6845 | 0.6845 | 0.6845 | 0.9480 |
189
+ | 2.0350 | 5000 | 0.0089 | 0.6802 | 0.6845 | 0.6824 | 0.9492 |
190
+ | 2.4420 | 6000 | 0.0084 | 0.6938 | 0.6953 | 0.6945 | 0.9539 |
191
+ | 2.8490 | 7000 | 0.0088 | 0.6884 | 0.7039 | 0.6960 | 0.9512 |
192
+ | 3.2560 | 8000 | 0.0086 | 0.6895 | 0.7124 | 0.7008 | 0.9514 |
193
+ | 3.6630 | 9000 | 0.0082 | 0.6989 | 0.7049 | 0.7019 | 0.9526 |
194
+ | 0.4070 | 1000 | 0.0080 | 0.7109 | 0.7124 | 0.7117 | 0.9535 |
195
+ | 0.8140 | 2000 | 0.0074 | 0.7577 | 0.7114 | 0.7338 | 0.9567 |
196
+ | 1.2210 | 3000 | 0.0083 | 0.7183 | 0.7414 | 0.7297 | 0.9554 |
197
+ | 1.6280 | 4000 | 0.0088 | 0.6987 | 0.7339 | 0.7159 | 0.9510 |
198
+ | 2.0350 | 5000 | 0.0086 | 0.7135 | 0.7296 | 0.7215 | 0.9541 |
199
+ | 2.4420 | 6000 | 0.0086 | 0.7167 | 0.7382 | 0.7273 | 0.9559 |
200
+ | 2.8490 | 7000 | 0.0088 | 0.7133 | 0.7554 | 0.7337 | 0.9541 |
201
+ | 3.2560 | 8000 | 0.0085 | 0.7165 | 0.7511 | 0.7334 | 0.9551 |
202
+ | 3.6630 | 9000 | 0.0083 | 0.7263 | 0.7489 | 0.7375 | 0.9561 |
203
+
204
+ ### Framework Versions
205
+ - Python: 3.10.12
206
+ - SpanMarker: 1.5.0
207
+ - Transformers: 4.35.2
208
+ - PyTorch: 2.1.0+cu118
209
+ - Datasets: 2.15.0
210
+ - Tokenizers: 0.15.0
211
+
212
+ ## Citation
213
+
214
+ ### BibTeX
215
+ ```
216
+ @software{Aarsen_SpanMarker,
217
+ author = {Aarsen, Tom},
218
+ license = {Apache-2.0},
219
+ title = {{SpanMarker for Named Entity Recognition}},
220
+ url = {https://github.com/tomaarsen/SpanMarkerNER}
221
+ }
222
+ ```
223
 
224
+ <!--
225
+ ## Glossary
226
 
227
+ *Clearly define terms in order to be accessible across audiences.*
228
+ -->
 
 
 
 
229
 
230
+ <!--
231
+ ## Model Card Authors
232
 
233
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
234
+ -->
235
 
236
+ <!--
237
+ ## Model Card Contact
 
 
 
 
 
 
238
 
239
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
240
+ -->
added_tokens.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "<end>": 128001,
3
+ "<start>": 128000
4
+ }
config.json CHANGED
@@ -1,25 +1,231 @@
1
  {
2
- "_name_or_path": "/content/alephbertgimmel/alephbertgimmel-small/ckpt_29400--Max128Seq",
3
  "architectures": [
4
- "BertModel"
5
  ],
6
- "attention_probs_dropout_prob": 0.1,
7
- "classifier_dropout": null,
8
- "hidden_act": "gelu",
9
- "hidden_dropout_prob": 0.1,
10
- "hidden_size": 512,
11
- "initializer_range": 0.02,
12
- "intermediate_size": 2048,
13
- "layer_norm_eps": 1e-12,
14
- "max_position_embeddings": 512,
15
- "model_type": "bert",
16
- "num_attention_heads": 8,
17
- "num_hidden_layers": 4,
18
- "pad_token_id": 0,
19
- "position_embedding_type": "absolute",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  "torch_dtype": "float32",
 
21
  "transformers_version": "4.35.2",
22
- "type_vocab_size": 2,
23
- "use_cache": true,
24
- "vocab_size": 128000
25
  }
 
1
  {
 
2
  "architectures": [
3
+ "SpanMarkerModel"
4
  ],
5
+ "encoder": {
6
+ "_name_or_path": "imvladikon/alephbertgimmel-small-128",
7
+ "add_cross_attention": false,
8
+ "architectures": [
9
+ "BertForMaskedLM"
10
+ ],
11
+ "attention_probs_dropout_prob": 0.1,
12
+ "bad_words_ids": null,
13
+ "begin_suppress_tokens": null,
14
+ "bos_token_id": null,
15
+ "chunk_size_feed_forward": 0,
16
+ "classifier_dropout": null,
17
+ "cross_attention_hidden_size": null,
18
+ "decoder_start_token_id": null,
19
+ "diversity_penalty": 0.0,
20
+ "do_sample": false,
21
+ "early_stopping": false,
22
+ "encoder_no_repeat_ngram_size": 0,
23
+ "eos_token_id": null,
24
+ "exponential_decay_length_penalty": null,
25
+ "finetuning_task": null,
26
+ "forced_bos_token_id": null,
27
+ "forced_eos_token_id": null,
28
+ "hidden_act": "gelu",
29
+ "hidden_dropout_prob": 0.1,
30
+ "hidden_size": 512,
31
+ "id2label": {
32
+ "0": "S-ANG",
33
+ "1": "B-ANG",
34
+ "2": "I-ANG",
35
+ "3": "E-ANG",
36
+ "4": "S-DUC",
37
+ "5": "B-DUC",
38
+ "6": "I-DUC",
39
+ "7": "E-DUC",
40
+ "8": "B-EVE",
41
+ "9": "E-EVE",
42
+ "10": "S-EVE",
43
+ "11": "I-EVE",
44
+ "12": "S-FAC",
45
+ "13": "B-FAC",
46
+ "14": "E-FAC",
47
+ "15": "I-FAC",
48
+ "16": "S-GPE",
49
+ "17": "B-GPE",
50
+ "18": "E-GPE",
51
+ "19": "I-GPE",
52
+ "20": "S-LOC",
53
+ "21": "B-LOC",
54
+ "22": "E-LOC",
55
+ "23": "I-LOC",
56
+ "24": "O",
57
+ "25": "S-ORG",
58
+ "26": "B-ORG",
59
+ "27": "E-ORG",
60
+ "28": "I-ORG",
61
+ "29": "B-PER",
62
+ "30": "I-PER",
63
+ "31": "E-PER",
64
+ "32": "S-PER",
65
+ "33": "B-WOA",
66
+ "34": "E-WOA",
67
+ "35": "I-WOA",
68
+ "36": "S-WOA"
69
+ },
70
+ "initializer_range": 0.02,
71
+ "intermediate_size": 2048,
72
+ "is_decoder": false,
73
+ "is_encoder_decoder": false,
74
+ "label2id": {
75
+ "B-ANG": 1,
76
+ "B-DUC": 5,
77
+ "B-EVE": 8,
78
+ "B-FAC": 13,
79
+ "B-GPE": 17,
80
+ "B-LOC": 21,
81
+ "B-ORG": 26,
82
+ "B-PER": 29,
83
+ "B-WOA": 33,
84
+ "E-ANG": 3,
85
+ "E-DUC": 7,
86
+ "E-EVE": 9,
87
+ "E-FAC": 14,
88
+ "E-GPE": 18,
89
+ "E-LOC": 22,
90
+ "E-ORG": 27,
91
+ "E-PER": 31,
92
+ "E-WOA": 34,
93
+ "I-ANG": 2,
94
+ "I-DUC": 6,
95
+ "I-EVE": 11,
96
+ "I-FAC": 15,
97
+ "I-GPE": 19,
98
+ "I-LOC": 23,
99
+ "I-ORG": 28,
100
+ "I-PER": 30,
101
+ "I-WOA": 35,
102
+ "O": 24,
103
+ "S-ANG": 0,
104
+ "S-DUC": 4,
105
+ "S-EVE": 10,
106
+ "S-FAC": 12,
107
+ "S-GPE": 16,
108
+ "S-LOC": 20,
109
+ "S-ORG": 25,
110
+ "S-PER": 32,
111
+ "S-WOA": 36
112
+ },
113
+ "layer_norm_eps": 1e-12,
114
+ "length_penalty": 1.0,
115
+ "max_length": 20,
116
+ "max_position_embeddings": 512,
117
+ "min_length": 0,
118
+ "model_type": "bert",
119
+ "no_repeat_ngram_size": 0,
120
+ "num_attention_heads": 8,
121
+ "num_beam_groups": 1,
122
+ "num_beams": 1,
123
+ "num_hidden_layers": 4,
124
+ "num_return_sequences": 1,
125
+ "output_attentions": false,
126
+ "output_hidden_states": false,
127
+ "output_scores": false,
128
+ "pad_token_id": 0,
129
+ "position_embedding_type": "absolute",
130
+ "prefix": null,
131
+ "problem_type": null,
132
+ "pruned_heads": {},
133
+ "remove_invalid_values": false,
134
+ "repetition_penalty": 1.0,
135
+ "return_dict": true,
136
+ "return_dict_in_generate": false,
137
+ "sep_token_id": null,
138
+ "suppress_tokens": null,
139
+ "task_specific_params": null,
140
+ "temperature": 1.0,
141
+ "tf_legacy_loss": false,
142
+ "tie_encoder_decoder": false,
143
+ "tie_word_embeddings": true,
144
+ "tokenizer_class": null,
145
+ "top_k": 50,
146
+ "top_p": 1.0,
147
+ "torch_dtype": "float32",
148
+ "torchscript": false,
149
+ "transformers_version": "4.35.2",
150
+ "type_vocab_size": 2,
151
+ "typical_p": 1.0,
152
+ "use_bfloat16": false,
153
+ "use_cache": true,
154
+ "vocab_size": 128008
155
+ },
156
+ "entity_max_length": 100,
157
+ "id2label": {
158
+ "0": "O",
159
+ "1": "ANG",
160
+ "2": "DUC",
161
+ "3": "EVE",
162
+ "4": "FAC",
163
+ "5": "GPE",
164
+ "6": "LOC",
165
+ "7": "ORG",
166
+ "8": "PER",
167
+ "9": "WOA"
168
+ },
169
+ "id2reduced_id": {
170
+ "0": 1,
171
+ "1": 1,
172
+ "2": 1,
173
+ "3": 1,
174
+ "4": 2,
175
+ "5": 2,
176
+ "6": 2,
177
+ "7": 2,
178
+ "8": 3,
179
+ "9": 3,
180
+ "10": 3,
181
+ "11": 3,
182
+ "12": 4,
183
+ "13": 4,
184
+ "14": 4,
185
+ "15": 4,
186
+ "16": 5,
187
+ "17": 5,
188
+ "18": 5,
189
+ "19": 5,
190
+ "20": 6,
191
+ "21": 6,
192
+ "22": 6,
193
+ "23": 6,
194
+ "24": 0,
195
+ "25": 7,
196
+ "26": 7,
197
+ "27": 7,
198
+ "28": 7,
199
+ "29": 8,
200
+ "30": 8,
201
+ "31": 8,
202
+ "32": 8,
203
+ "33": 9,
204
+ "34": 9,
205
+ "35": 9,
206
+ "36": 9
207
+ },
208
+ "label2id": {
209
+ "ANG": 1,
210
+ "DUC": 2,
211
+ "EVE": 3,
212
+ "FAC": 4,
213
+ "GPE": 5,
214
+ "LOC": 6,
215
+ "O": 0,
216
+ "ORG": 7,
217
+ "PER": 8,
218
+ "WOA": 9
219
+ },
220
+ "marker_max_length": 128,
221
+ "max_next_context": null,
222
+ "max_prev_context": null,
223
+ "model_max_length": 512,
224
+ "model_max_length_default": 512,
225
+ "model_type": "span-marker",
226
+ "span_marker_version": "1.5.0",
227
  "torch_dtype": "float32",
228
+ "trained_with_document_context": false,
229
  "transformers_version": "4.35.2",
230
+ "vocab_size": 128008
 
 
231
  }
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f3b26e839690ccd14b72f6959e99a39c0d4da600f8cde12639e72825daf2f9c4
3
- size 314697456
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0921ec8eacb9a71752105455edf2ab73e2500a3e106180fc2cdf0754aa6633aa
3
+ size 314755568
runs/Nov23_23-23-18_25de05a58e1f/events.out.tfevents.1700781874.25de05a58e1f.158.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:96310a7617981ee7a98c9e2be884c0fefc9ff76ca72c4e428992d788a8fee9a0
3
+ size 44099
runs/Nov23_23-23-18_25de05a58e1f/events.out.tfevents.1700783199.25de05a58e1f.158.1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b59a34a4d7694c50608a11b8e520c2bf1850a0f4102beaea47bd765fcd90bb4a
3
+ size 1096
runs/Nov23_23-23-18_25de05a58e1f/events.out.tfevents.1700784030.25de05a58e1f.158.2 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:99d21bf28512cf124e7bf020803b5cb2db40586af57cc9f7ab630f44512512de
3
+ size 44175
runs/Nov23_23-23-18_25de05a58e1f/events.out.tfevents.1700785412.25de05a58e1f.158.3 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92a7ee30a4ab77f11210ed6b17eb05026189558960483cdc2cac5a0c7192752a
3
+ size 44175
runs/Nov23_23-23-18_25de05a58e1f/events.out.tfevents.1700786721.25de05a58e1f.158.4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:435ec24d374619881085d6d3dc931ff8d3dbc0020e1977189324220d9f2ac72a
3
+ size 1096
tokenizer.json CHANGED
@@ -1,7 +1,21 @@
1
  {
2
  "version": "1.0",
3
- "truncation": null,
4
- "padding": null,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  "added_tokens": [
6
  {
7
  "id": 0,
@@ -47,6 +61,24 @@
47
  "rstrip": false,
48
  "normalized": false,
49
  "special": true
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
  }
51
  ],
52
  "normalizer": {
 
1
  {
2
  "version": "1.0",
3
+ "truncation": {
4
+ "direction": "Right",
5
+ "max_length": 512,
6
+ "strategy": "LongestFirst",
7
+ "stride": 0
8
+ },
9
+ "padding": {
10
+ "strategy": {
11
+ "Fixed": 512
12
+ },
13
+ "direction": "Right",
14
+ "pad_to_multiple_of": null,
15
+ "pad_id": 3,
16
+ "pad_type_id": 0,
17
+ "pad_token": "[PAD]"
18
+ },
19
  "added_tokens": [
20
  {
21
  "id": 0,
 
61
  "rstrip": false,
62
  "normalized": false,
63
  "special": true
64
+ },
65
+ {
66
+ "id": 128000,
67
+ "content": "<start>",
68
+ "single_word": false,
69
+ "lstrip": false,
70
+ "rstrip": false,
71
+ "normalized": false,
72
+ "special": true
73
+ },
74
+ {
75
+ "id": 128001,
76
+ "content": "<end>",
77
+ "single_word": false,
78
+ "lstrip": false,
79
+ "rstrip": false,
80
+ "normalized": false,
81
+ "special": true
82
  }
83
  ],
84
  "normalizer": {
tokenizer_config.json CHANGED
@@ -1,4 +1,5 @@
1
  {
 
2
  "added_tokens_decoder": {
3
  "0": {
4
  "content": "[UNK]",
@@ -39,14 +40,31 @@
39
  "rstrip": false,
40
  "single_word": false,
41
  "special": true
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  }
43
  },
44
  "clean_up_tokenization_spaces": true,
45
  "cls_token": "[CLS]",
46
  "do_basic_tokenize": true,
47
  "do_lower_case": true,
 
48
  "mask_token": "[MASK]",
49
- "model_max_length": 1000000000000000019884624838656,
50
  "never_split": null,
51
  "pad_token": "[PAD]",
52
  "sep_token": "[SEP]",
 
1
  {
2
+ "add_prefix_space": true,
3
  "added_tokens_decoder": {
4
  "0": {
5
  "content": "[UNK]",
 
40
  "rstrip": false,
41
  "single_word": false,
42
  "special": true
43
+ },
44
+ "128000": {
45
+ "content": "<start>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": true
51
+ },
52
+ "128001": {
53
+ "content": "<end>",
54
+ "lstrip": false,
55
+ "normalized": false,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": true
59
  }
60
  },
61
  "clean_up_tokenization_spaces": true,
62
  "cls_token": "[CLS]",
63
  "do_basic_tokenize": true,
64
  "do_lower_case": true,
65
+ "entity_max_length": 100,
66
  "mask_token": "[MASK]",
67
+ "model_max_length": 512,
68
  "never_split": null,
69
  "pad_token": "[PAD]",
70
  "sep_token": "[SEP]",
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e5cd46ec2a8cfe36fcb897224363cd1797644a2c2f91178322ebc1298747396f
3
+ size 4600