whitemouse84 commited on
Commit
2412ed2
1 Parent(s): 6c24a5a

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
2_Dense/config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"in_features": 768, "out_features": 768, "bias": true, "activation_function": "torch.nn.modules.activation.Tanh"}
2_Dense/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f403f7a3c694bb883eb53f22727d5cd79ceaf4bce215ed1b357a4abb36e7d403
3
+ size 2362528
README.md ADDED
@@ -0,0 +1,706 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: cointegrated/LaBSE-en-ru
3
+ datasets: []
4
+ language: []
5
+ library_name: sentence-transformers
6
+ metrics:
7
+ - pearson_cosine
8
+ - spearman_cosine
9
+ - pearson_manhattan
10
+ - spearman_manhattan
11
+ - pearson_euclidean
12
+ - spearman_euclidean
13
+ - pearson_dot
14
+ - spearman_dot
15
+ - pearson_max
16
+ - spearman_max
17
+ - negative_mse
18
+ pipeline_tag: sentence-similarity
19
+ tags:
20
+ - sentence-transformers
21
+ - sentence-similarity
22
+ - feature-extraction
23
+ - generated_from_trainer
24
+ - dataset_size:10975066
25
+ - loss:MSELoss
26
+ widget:
27
+ - source_sentence: Такие лодки строились, чтобы получить быстрый доступ к приходящим
28
+ судам.
29
+ sentences:
30
+ - been nice talking to you
31
+ - Нельзя ставить под сомнение притязания клиента, если не были предприняты шаги.
32
+ - Dharangaon Railway Station serves Dharangaon in Jalgaon district in the Indian
33
+ state of Maharashtra.
34
+ - source_sentence: Если прилагательные смягчают этнические термины, существительные
35
+ могут сделать их жестче.
36
+ sentences:
37
+ - Вслед за этим последовало секретное письмо А.Б.Чубайса об изъятии у МЦР, переданного
38
+ ему С.Н.Рерихом наследия.
39
+ - Coaches should not give young athletes a hard time.
40
+ - Эшкрофт хотел прослушивать сводки новостей снова и снова
41
+ - source_sentence: Земля была мягкой.
42
+ sentences:
43
+ - По мере того, как самообладание покидало его, сердце его все больше наполнялось
44
+ тревогой.
45
+ - Our borders and immigration system, including law enforcement, ought to send a
46
+ message of welcome, tolerance, and justice to members of immigrant communities
47
+ in the United States and in their countries of origin.
48
+ - Начнут действовать льготные условия аренды земель, которые предназначены для реализации
49
+ инвестиционных проектов.
50
+ - source_sentence: 'Что же касается рава Кука: мой рав лично знал его и много раз
51
+ с теплотой рассказывал мне о нем как о великом каббалисте.'
52
+ sentences:
53
+ - Вдова Эдгара Эванса, его дети и мать получили 1500 фунтов стерлингов (
54
+ - Please do not make any changes to your address.
55
+ - Мы уже закончили все запланированные дела!
56
+ - source_sentence: See Name section.
57
+ sentences:
58
+ - Ms. Packard is the voice of the female blood elf in the video game World of Warcraft.
59
+ - Основным функциональным элементом, реализующим функции управления соединением,
60
+ является абонентский терминал.
61
+ - Yeah, people who might not be hungry.
62
+ model-index:
63
+ - name: SentenceTransformer based on cointegrated/LaBSE-en-ru
64
+ results:
65
+ - task:
66
+ type: semantic-similarity
67
+ name: Semantic Similarity
68
+ dataset:
69
+ name: sts dev
70
+ type: sts-dev
71
+ metrics:
72
+ - type: pearson_cosine
73
+ value: 0.5305176535187099
74
+ name: Pearson Cosine
75
+ - type: spearman_cosine
76
+ value: 0.6347069834349862
77
+ name: Spearman Cosine
78
+ - type: pearson_manhattan
79
+ value: 0.5553415140113596
80
+ name: Pearson Manhattan
81
+ - type: spearman_manhattan
82
+ value: 0.6389336208598283
83
+ name: Spearman Manhattan
84
+ - type: pearson_euclidean
85
+ value: 0.5499910306125031
86
+ name: Pearson Euclidean
87
+ - type: spearman_euclidean
88
+ value: 0.6347073809507647
89
+ name: Spearman Euclidean
90
+ - type: pearson_dot
91
+ value: 0.5305176585564861
92
+ name: Pearson Dot
93
+ - type: spearman_dot
94
+ value: 0.6347078463557637
95
+ name: Spearman Dot
96
+ - type: pearson_max
97
+ value: 0.5553415140113596
98
+ name: Pearson Max
99
+ - type: spearman_max
100
+ value: 0.6389336208598283
101
+ name: Spearman Max
102
+ - task:
103
+ type: knowledge-distillation
104
+ name: Knowledge Distillation
105
+ dataset:
106
+ name: Unknown
107
+ type: unknown
108
+ metrics:
109
+ - type: negative_mse
110
+ value: -0.006337030936265364
111
+ name: Negative Mse
112
+ - task:
113
+ type: semantic-similarity
114
+ name: Semantic Similarity
115
+ dataset:
116
+ name: sts test
117
+ type: sts-test
118
+ metrics:
119
+ - type: pearson_cosine
120
+ value: 0.5042796836494269
121
+ name: Pearson Cosine
122
+ - type: spearman_cosine
123
+ value: 0.5986471772428711
124
+ name: Spearman Cosine
125
+ - type: pearson_manhattan
126
+ value: 0.522744495080616
127
+ name: Pearson Manhattan
128
+ - type: spearman_manhattan
129
+ value: 0.5983901280447074
130
+ name: Spearman Manhattan
131
+ - type: pearson_euclidean
132
+ value: 0.522721961447153
133
+ name: Pearson Euclidean
134
+ - type: spearman_euclidean
135
+ value: 0.5986471095414022
136
+ name: Spearman Euclidean
137
+ - type: pearson_dot
138
+ value: 0.504279685613151
139
+ name: Pearson Dot
140
+ - type: spearman_dot
141
+ value: 0.598648155615724
142
+ name: Spearman Dot
143
+ - type: pearson_max
144
+ value: 0.522744495080616
145
+ name: Pearson Max
146
+ - type: spearman_max
147
+ value: 0.598648155615724
148
+ name: Spearman Max
149
+ ---
150
+
151
+ # SentenceTransformer based on cointegrated/LaBSE-en-ru
152
+
153
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [cointegrated/LaBSE-en-ru](https://huggingface.co/cointegrated/LaBSE-en-ru). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
154
+
155
+ ## Model Details
156
+
157
+ ### Model Description
158
+ - **Model Type:** Sentence Transformer
159
+ - **Base model:** [cointegrated/LaBSE-en-ru](https://huggingface.co/cointegrated/LaBSE-en-ru) <!-- at revision cf0714e606d4af551e14ad69a7929cd6b0da7f7e -->
160
+ - **Maximum Sequence Length:** 512 tokens
161
+ - **Output Dimensionality:** 768 tokens
162
+ - **Similarity Function:** Cosine Similarity
163
+ <!-- - **Training Dataset:** Unknown -->
164
+ <!-- - **Language:** Unknown -->
165
+ <!-- - **License:** Unknown -->
166
+
167
+ ### Model Sources
168
+
169
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
170
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
171
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
172
+
173
+ ### Full Model Architecture
174
+
175
+ ```
176
+ SentenceTransformer(
177
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
178
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
179
+ (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
180
+ (3): Normalize()
181
+ )
182
+ ```
183
+
184
+ ## Usage
185
+
186
+ ### Direct Usage (Sentence Transformers)
187
+
188
+ First install the Sentence Transformers library:
189
+
190
+ ```bash
191
+ pip install -U sentence-transformers
192
+ ```
193
+
194
+ Then you can load this model and run inference.
195
+ ```python
196
+ from sentence_transformers import SentenceTransformer
197
+
198
+ # Download from the 🤗 Hub
199
+ model = SentenceTransformer("whitemouse84/LaBSE-en-ru-distilled-each-third-layer")
200
+ # Run inference
201
+ sentences = [
202
+ 'See Name section.',
203
+ 'Ms. Packard is the voice of the female blood elf in the video game World of Warcraft.',
204
+ 'Yeah, people who might not be hungry.',
205
+ ]
206
+ embeddings = model.encode(sentences)
207
+ print(embeddings.shape)
208
+ # [3, 768]
209
+
210
+ # Get the similarity scores for the embeddings
211
+ similarities = model.similarity(embeddings, embeddings)
212
+ print(similarities.shape)
213
+ # [3, 3]
214
+ ```
215
+
216
+ <!--
217
+ ### Direct Usage (Transformers)
218
+
219
+ <details><summary>Click to see the direct usage in Transformers</summary>
220
+
221
+ </details>
222
+ -->
223
+
224
+ <!--
225
+ ### Downstream Usage (Sentence Transformers)
226
+
227
+ You can finetune this model on your own dataset.
228
+
229
+ <details><summary>Click to expand</summary>
230
+
231
+ </details>
232
+ -->
233
+
234
+ <!--
235
+ ### Out-of-Scope Use
236
+
237
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
238
+ -->
239
+
240
+ ## Evaluation
241
+
242
+ ### Metrics
243
+
244
+ #### Semantic Similarity
245
+ * Dataset: `sts-dev`
246
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
247
+
248
+ | Metric | Value |
249
+ |:--------------------|:-----------|
250
+ | pearson_cosine | 0.5305 |
251
+ | **spearman_cosine** | **0.6347** |
252
+ | pearson_manhattan | 0.5553 |
253
+ | spearman_manhattan | 0.6389 |
254
+ | pearson_euclidean | 0.55 |
255
+ | spearman_euclidean | 0.6347 |
256
+ | pearson_dot | 0.5305 |
257
+ | spearman_dot | 0.6347 |
258
+ | pearson_max | 0.5553 |
259
+ | spearman_max | 0.6389 |
260
+
261
+ #### Knowledge Distillation
262
+
263
+ * Evaluated with [<code>MSEEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.MSEEvaluator)
264
+
265
+ | Metric | Value |
266
+ |:-----------------|:------------|
267
+ | **negative_mse** | **-0.0063** |
268
+
269
+ #### Semantic Similarity
270
+ * Dataset: `sts-test`
271
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
272
+
273
+ | Metric | Value |
274
+ |:--------------------|:-----------|
275
+ | pearson_cosine | 0.5043 |
276
+ | **spearman_cosine** | **0.5986** |
277
+ | pearson_manhattan | 0.5227 |
278
+ | spearman_manhattan | 0.5984 |
279
+ | pearson_euclidean | 0.5227 |
280
+ | spearman_euclidean | 0.5986 |
281
+ | pearson_dot | 0.5043 |
282
+ | spearman_dot | 0.5986 |
283
+ | pearson_max | 0.5227 |
284
+ | spearman_max | 0.5986 |
285
+
286
+ <!--
287
+ ## Bias, Risks and Limitations
288
+
289
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
290
+ -->
291
+
292
+ <!--
293
+ ### Recommendations
294
+
295
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
296
+ -->
297
+
298
+ ## Training Details
299
+
300
+ ### Training Dataset
301
+
302
+ #### Unnamed Dataset
303
+
304
+
305
+ * Size: 10,975,066 training samples
306
+ * Columns: <code>sentence</code> and <code>label</code>
307
+ * Approximate statistics based on the first 1000 samples:
308
+ | | sentence | label |
309
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
310
+ | type | string | list |
311
+ | details | <ul><li>min: 6 tokens</li><li>mean: 26.93 tokens</li><li>max: 139 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
312
+ * Samples:
313
+ | sentence | label |
314
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------|
315
+ | <code>It is based on the Java Persistence API (JPA), but it does not strictly follow the JSR 338 Specification, as it implements different design patterns and technologies.</code> | <code>[-0.012331949546933174, -0.04570527374744415, -0.024963658303022385, -0.03620213270187378, 0.022556383162736893, ...]</code> |
316
+ | <code>Покупаем вторичное сырье в Каунасе (Переработка вторичного сырья) - Алфенас АНД КО, ЗАО на Bizorg.</code> | <code>[-0.07498518377542496, -0.01913534104824066, -0.01797042042016983, 0.048263177275657654, -0.00016611881437711418, ...]</code> |
317
+ | <code>At the Equal Justice Conference ( EJC ) held in March 2001 in San Diego , LSC and the Project for the Future of Equal Justice held the second Case Management Software pre-conference .</code> | <code>[0.03870972990989685, -0.0638347640633583, -0.01696585863828659, -0.043612319976091385, -0.048241738229990005, ...]</code> |
318
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)
319
+
320
+ ### Evaluation Dataset
321
+
322
+ #### Unnamed Dataset
323
+
324
+
325
+ * Size: 10,000 evaluation samples
326
+ * Columns: <code>sentence</code> and <code>label</code>
327
+ * Approximate statistics based on the first 1000 samples:
328
+ | | sentence | label |
329
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------|
330
+ | type | string | list |
331
+ | details | <ul><li>min: 5 tokens</li><li>mean: 24.18 tokens</li><li>max: 111 tokens</li></ul> | <ul><li>size: 768 elements</li></ul> |
332
+ * Samples:
333
+ | sentence | label |
334
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------|
335
+ | <code>The Canadian Canoe Museum is a museum dedicated to canoes located in Peterborough, Ontario, Canada.</code> | <code>[-0.05444105342030525, -0.03650881350040436, -0.041163671761751175, -0.010616903193295002, -0.04094529151916504, ...]</code> |
336
+ | <code>И мне нравилось, что я одновременно зарабатываю и смотрю бои».</code> | <code>[-0.03404555842280388, 0.028203096240758896, -0.056121889501810074, -0.0591997392475605, -0.05523117259144783, ...]</code> |
337
+ | <code>Ну, а на следующий день, разумеется, Президент Кеннеди объявил блокаду Кубы, и наши корабли остановили у кубинских берегов направлявшийся на Кубу российский корабль, и у него на борту нашли ракеты.</code> | <code>[-0.008193841204047203, 0.00694894278421998, -0.03027420863509178, -0.03290146216750145, 0.01425305474549532, ...]</code> |
338
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)
339
+
340
+ ### Training Hyperparameters
341
+ #### Non-Default Hyperparameters
342
+
343
+ - `eval_strategy`: steps
344
+ - `per_device_train_batch_size`: 64
345
+ - `per_device_eval_batch_size`: 64
346
+ - `learning_rate`: 0.0001
347
+ - `num_train_epochs`: 1
348
+ - `warmup_ratio`: 0.1
349
+ - `fp16`: True
350
+ - `load_best_model_at_end`: True
351
+
352
+ #### All Hyperparameters
353
+ <details><summary>Click to expand</summary>
354
+
355
+ - `overwrite_output_dir`: False
356
+ - `do_predict`: False
357
+ - `eval_strategy`: steps
358
+ - `prediction_loss_only`: True
359
+ - `per_device_train_batch_size`: 64
360
+ - `per_device_eval_batch_size`: 64
361
+ - `per_gpu_train_batch_size`: None
362
+ - `per_gpu_eval_batch_size`: None
363
+ - `gradient_accumulation_steps`: 1
364
+ - `eval_accumulation_steps`: None
365
+ - `torch_empty_cache_steps`: None
366
+ - `learning_rate`: 0.0001
367
+ - `weight_decay`: 0.0
368
+ - `adam_beta1`: 0.9
369
+ - `adam_beta2`: 0.999
370
+ - `adam_epsilon`: 1e-08
371
+ - `max_grad_norm`: 1.0
372
+ - `num_train_epochs`: 1
373
+ - `max_steps`: -1
374
+ - `lr_scheduler_type`: linear
375
+ - `lr_scheduler_kwargs`: {}
376
+ - `warmup_ratio`: 0.1
377
+ - `warmup_steps`: 0
378
+ - `log_level`: passive
379
+ - `log_level_replica`: warning
380
+ - `log_on_each_node`: True
381
+ - `logging_nan_inf_filter`: True
382
+ - `save_safetensors`: True
383
+ - `save_on_each_node`: False
384
+ - `save_only_model`: False
385
+ - `restore_callback_states_from_checkpoint`: False
386
+ - `no_cuda`: False
387
+ - `use_cpu`: False
388
+ - `use_mps_device`: False
389
+ - `seed`: 42
390
+ - `data_seed`: None
391
+ - `jit_mode_eval`: False
392
+ - `use_ipex`: False
393
+ - `bf16`: False
394
+ - `fp16`: True
395
+ - `fp16_opt_level`: O1
396
+ - `half_precision_backend`: auto
397
+ - `bf16_full_eval`: False
398
+ - `fp16_full_eval`: False
399
+ - `tf32`: None
400
+ - `local_rank`: 0
401
+ - `ddp_backend`: None
402
+ - `tpu_num_cores`: None
403
+ - `tpu_metrics_debug`: False
404
+ - `debug`: []
405
+ - `dataloader_drop_last`: False
406
+ - `dataloader_num_workers`: 0
407
+ - `dataloader_prefetch_factor`: None
408
+ - `past_index`: -1
409
+ - `disable_tqdm`: False
410
+ - `remove_unused_columns`: True
411
+ - `label_names`: None
412
+ - `load_best_model_at_end`: True
413
+ - `ignore_data_skip`: False
414
+ - `fsdp`: []
415
+ - `fsdp_min_num_params`: 0
416
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
417
+ - `fsdp_transformer_layer_cls_to_wrap`: None
418
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
419
+ - `deepspeed`: None
420
+ - `label_smoothing_factor`: 0.0
421
+ - `optim`: adamw_torch
422
+ - `optim_args`: None
423
+ - `adafactor`: False
424
+ - `group_by_length`: False
425
+ - `length_column_name`: length
426
+ - `ddp_find_unused_parameters`: None
427
+ - `ddp_bucket_cap_mb`: None
428
+ - `ddp_broadcast_buffers`: False
429
+ - `dataloader_pin_memory`: True
430
+ - `dataloader_persistent_workers`: False
431
+ - `skip_memory_metrics`: True
432
+ - `use_legacy_prediction_loop`: False
433
+ - `push_to_hub`: False
434
+ - `resume_from_checkpoint`: None
435
+ - `hub_model_id`: None
436
+ - `hub_strategy`: every_save
437
+ - `hub_private_repo`: False
438
+ - `hub_always_push`: False
439
+ - `gradient_checkpointing`: False
440
+ - `gradient_checkpointing_kwargs`: None
441
+ - `include_inputs_for_metrics`: False
442
+ - `eval_do_concat_batches`: True
443
+ - `fp16_backend`: auto
444
+ - `push_to_hub_model_id`: None
445
+ - `push_to_hub_organization`: None
446
+ - `mp_parameters`:
447
+ - `auto_find_batch_size`: False
448
+ - `full_determinism`: False
449
+ - `torchdynamo`: None
450
+ - `ray_scope`: last
451
+ - `ddp_timeout`: 1800
452
+ - `torch_compile`: False
453
+ - `torch_compile_backend`: None
454
+ - `torch_compile_mode`: None
455
+ - `dispatch_batches`: None
456
+ - `split_batches`: None
457
+ - `include_tokens_per_second`: False
458
+ - `include_num_input_tokens_seen`: False
459
+ - `neftune_noise_alpha`: None
460
+ - `optim_target_modules`: None
461
+ - `batch_eval_metrics`: False
462
+ - `eval_on_start`: False
463
+ - `eval_use_gather_object`: False
464
+ - `batch_sampler`: batch_sampler
465
+ - `multi_dataset_batch_sampler`: proportional
466
+
467
+ </details>
468
+
469
+ ### Training Logs
470
+ <details><summary>Click to expand</summary>
471
+
472
+ | Epoch | Step | Training Loss | loss | negative_mse | sts-dev_spearman_cosine | sts-test_spearman_cosine |
473
+ |:----------:|:--------:|:-------------:|:----------:|:------------:|:-----------------------:|:------------------------:|
474
+ | 0 | 0 | - | - | -0.2381 | 0.4206 | - |
475
+ | 0.0058 | 1000 | 0.0014 | - | - | - | - |
476
+ | 0.0117 | 2000 | 0.0009 | - | - | - | - |
477
+ | 0.0175 | 3000 | 0.0007 | - | - | - | - |
478
+ | 0.0233 | 4000 | 0.0006 | - | - | - | - |
479
+ | **0.0292** | **5000** | **0.0005** | **0.0004** | **-0.0363** | **0.6393** | **-** |
480
+ | 0.0350 | 6000 | 0.0004 | - | - | - | - |
481
+ | 0.0408 | 7000 | 0.0004 | - | - | - | - |
482
+ | 0.0467 | 8000 | 0.0003 | - | - | - | - |
483
+ | 0.0525 | 9000 | 0.0003 | - | - | - | - |
484
+ | 0.0583 | 10000 | 0.0003 | 0.0002 | -0.0207 | 0.6350 | - |
485
+ | 0.0641 | 11000 | 0.0003 | - | - | - | - |
486
+ | 0.0700 | 12000 | 0.0003 | - | - | - | - |
487
+ | 0.0758 | 13000 | 0.0002 | - | - | - | - |
488
+ | 0.0816 | 14000 | 0.0002 | - | - | - | - |
489
+ | 0.0875 | 15000 | 0.0002 | 0.0002 | -0.0157 | 0.6328 | - |
490
+ | 0.0933 | 16000 | 0.0002 | - | - | - | - |
491
+ | 0.0991 | 17000 | 0.0002 | - | - | - | - |
492
+ | 0.1050 | 18000 | 0.0002 | - | - | - | - |
493
+ | 0.1108 | 19000 | 0.0002 | - | - | - | - |
494
+ | 0.1166 | 20000 | 0.0002 | 0.0001 | -0.0132 | 0.6317 | - |
495
+ | 0.1225 | 21000 | 0.0002 | - | - | - | - |
496
+ | 0.1283 | 22000 | 0.0002 | - | - | - | - |
497
+ | 0.1341 | 23000 | 0.0002 | - | - | - | - |
498
+ | 0.1400 | 24000 | 0.0002 | - | - | - | - |
499
+ | 0.1458 | 25000 | 0.0002 | 0.0001 | -0.0118 | 0.6251 | - |
500
+ | 0.1516 | 26000 | 0.0002 | - | - | - | - |
501
+ | 0.1574 | 27000 | 0.0002 | - | - | - | - |
502
+ | 0.1633 | 28000 | 0.0002 | - | - | - | - |
503
+ | 0.1691 | 29000 | 0.0002 | - | - | - | - |
504
+ | 0.1749 | 30000 | 0.0002 | 0.0001 | -0.0109 | 0.6304 | - |
505
+ | 0.1808 | 31000 | 0.0002 | - | - | - | - |
506
+ | 0.1866 | 32000 | 0.0002 | - | - | - | - |
507
+ | 0.1924 | 33000 | 0.0002 | - | - | - | - |
508
+ | 0.1983 | 34000 | 0.0001 | - | - | - | - |
509
+ | 0.2041 | 35000 | 0.0001 | 0.0001 | -0.0102 | 0.6280 | - |
510
+ | 0.2099 | 36000 | 0.0001 | - | - | - | - |
511
+ | 0.2158 | 37000 | 0.0001 | - | - | - | - |
512
+ | 0.2216 | 38000 | 0.0001 | - | - | - | - |
513
+ | 0.2274 | 39000 | 0.0001 | - | - | - | - |
514
+ | 0.2333 | 40000 | 0.0001 | 0.0001 | -0.0098 | 0.6272 | - |
515
+ | 0.2391 | 41000 | 0.0001 | - | - | - | - |
516
+ | 0.2449 | 42000 | 0.0001 | - | - | - | - |
517
+ | 0.2507 | 43000 | 0.0001 | - | - | - | - |
518
+ | 0.2566 | 44000 | 0.0001 | - | - | - | - |
519
+ | 0.2624 | 45000 | 0.0001 | 0.0001 | -0.0093 | 0.6378 | - |
520
+ | 0.2682 | 46000 | 0.0001 | - | - | - | - |
521
+ | 0.2741 | 47000 | 0.0001 | - | - | - | - |
522
+ | 0.2799 | 48000 | 0.0001 | - | - | - | - |
523
+ | 0.2857 | 49000 | 0.0001 | - | - | - | - |
524
+ | 0.2916 | 50000 | 0.0001 | 0.0001 | -0.0089 | 0.6325 | - |
525
+ | 0.2974 | 51000 | 0.0001 | - | - | - | - |
526
+ | 0.3032 | 52000 | 0.0001 | - | - | - | - |
527
+ | 0.3091 | 53000 | 0.0001 | - | - | - | - |
528
+ | 0.3149 | 54000 | 0.0001 | - | - | - | - |
529
+ | 0.3207 | 55000 | 0.0001 | 0.0001 | -0.0087 | 0.6328 | - |
530
+ | 0.3266 | 56000 | 0.0001 | - | - | - | - |
531
+ | 0.3324 | 57000 | 0.0001 | - | - | - | - |
532
+ | 0.3382 | 58000 | 0.0001 | - | - | - | - |
533
+ | 0.3441 | 59000 | 0.0001 | - | - | - | - |
534
+ | 0.3499 | 60000 | 0.0001 | 0.0001 | -0.0085 | 0.6357 | - |
535
+ | 0.3557 | 61000 | 0.0001 | - | - | - | - |
536
+ | 0.3615 | 62000 | 0.0001 | - | - | - | - |
537
+ | 0.3674 | 63000 | 0.0001 | - | - | - | - |
538
+ | 0.3732 | 64000 | 0.0001 | - | - | - | - |
539
+ | 0.3790 | 65000 | 0.0001 | 0.0001 | -0.0083 | 0.6366 | - |
540
+ | 0.3849 | 66000 | 0.0001 | - | - | - | - |
541
+ | 0.3907 | 67000 | 0.0001 | - | - | - | - |
542
+ | 0.3965 | 68000 | 0.0001 | - | - | - | - |
543
+ | 0.4024 | 69000 | 0.0001 | - | - | - | - |
544
+ | 0.4082 | 70000 | 0.0001 | 0.0001 | -0.0080 | 0.6325 | - |
545
+ | 0.4140 | 71000 | 0.0001 | - | - | - | - |
546
+ | 0.4199 | 72000 | 0.0001 | - | - | - | - |
547
+ | 0.4257 | 73000 | 0.0001 | - | - | - | - |
548
+ | 0.4315 | 74000 | 0.0001 | - | - | - | - |
549
+ | 0.4374 | 75000 | 0.0001 | 0.0001 | -0.0078 | 0.6351 | - |
550
+ | 0.4432 | 76000 | 0.0001 | - | - | - | - |
551
+ | 0.4490 | 77000 | 0.0001 | - | - | - | - |
552
+ | 0.4548 | 78000 | 0.0001 | - | - | - | - |
553
+ | 0.4607 | 79000 | 0.0001 | - | - | - | - |
554
+ | 0.4665 | 80000 | 0.0001 | 0.0001 | -0.0077 | 0.6323 | - |
555
+ | 0.4723 | 81000 | 0.0001 | - | - | - | - |
556
+ | 0.4782 | 82000 | 0.0001 | - | - | - | - |
557
+ | 0.4840 | 83000 | 0.0001 | - | - | - | - |
558
+ | 0.4898 | 84000 | 0.0001 | - | - | - | - |
559
+ | 0.4957 | 85000 | 0.0001 | 0.0001 | -0.0076 | 0.6316 | - |
560
+ | 0.5015 | 86000 | 0.0001 | - | - | - | - |
561
+ | 0.5073 | 87000 | 0.0001 | - | - | - | - |
562
+ | 0.5132 | 88000 | 0.0001 | - | - | - | - |
563
+ | 0.5190 | 89000 | 0.0001 | - | - | - | - |
564
+ | 0.5248 | 90000 | 0.0001 | 0.0001 | -0.0074 | 0.6306 | - |
565
+ | 0.5307 | 91000 | 0.0001 | - | - | - | - |
566
+ | 0.5365 | 92000 | 0.0001 | - | - | - | - |
567
+ | 0.5423 | 93000 | 0.0001 | - | - | - | - |
568
+ | 0.5481 | 94000 | 0.0001 | - | - | - | - |
569
+ | 0.5540 | 95000 | 0.0001 | 0.0001 | -0.0073 | 0.6305 | - |
570
+ | 0.5598 | 96000 | 0.0001 | - | - | - | - |
571
+ | 0.5656 | 97000 | 0.0001 | - | - | - | - |
572
+ | 0.5715 | 98000 | 0.0001 | - | - | - | - |
573
+ | 0.5773 | 99000 | 0.0001 | - | - | - | - |
574
+ | 0.5831 | 100000 | 0.0001 | 0.0001 | -0.0072 | 0.6333 | - |
575
+ | 0.5890 | 101000 | 0.0001 | - | - | - | - |
576
+ | 0.5948 | 102000 | 0.0001 | - | - | - | - |
577
+ | 0.6006 | 103000 | 0.0001 | - | - | - | - |
578
+ | 0.6065 | 104000 | 0.0001 | - | - | - | - |
579
+ | 0.6123 | 105000 | 0.0001 | 0.0001 | -0.0071 | 0.6351 | - |
580
+ | 0.6181 | 106000 | 0.0001 | - | - | - | - |
581
+ | 0.6240 | 107000 | 0.0001 | - | - | - | - |
582
+ | 0.6298 | 108000 | 0.0001 | - | - | - | - |
583
+ | 0.6356 | 109000 | 0.0001 | - | - | - | - |
584
+ | 0.6415 | 110000 | 0.0001 | 0.0001 | -0.0070 | 0.6330 | - |
585
+ | 0.6473 | 111000 | 0.0001 | - | - | - | - |
586
+ | 0.6531 | 112000 | 0.0001 | - | - | - | - |
587
+ | 0.6589 | 113000 | 0.0001 | - | - | - | - |
588
+ | 0.6648 | 114000 | 0.0001 | - | - | - | - |
589
+ | 0.6706 | 115000 | 0.0001 | 0.0001 | -0.0070 | 0.6336 | - |
590
+ | 0.6764 | 116000 | 0.0001 | - | - | - | - |
591
+ | 0.6823 | 117000 | 0.0001 | - | - | - | - |
592
+ | 0.6881 | 118000 | 0.0001 | - | - | - | - |
593
+ | 0.6939 | 119000 | 0.0001 | - | - | - | - |
594
+ | 0.6998 | 120000 | 0.0001 | 0.0001 | -0.0069 | 0.6305 | - |
595
+ | 0.7056 | 121000 | 0.0001 | - | - | - | - |
596
+ | 0.7114 | 122000 | 0.0001 | - | - | - | - |
597
+ | 0.7173 | 123000 | 0.0001 | - | - | - | - |
598
+ | 0.7231 | 124000 | 0.0001 | - | - | - | - |
599
+ | 0.7289 | 125000 | 0.0001 | 0.0001 | -0.0068 | 0.6362 | - |
600
+ | 0.7348 | 126000 | 0.0001 | - | - | - | - |
601
+ | 0.7406 | 127000 | 0.0001 | - | - | - | - |
602
+ | 0.7464 | 128000 | 0.0001 | - | - | - | - |
603
+ | 0.7522 | 129000 | 0.0001 | - | - | - | - |
604
+ | 0.7581 | 130000 | 0.0001 | 0.0001 | -0.0067 | 0.6340 | - |
605
+ | 0.7639 | 131000 | 0.0001 | - | - | - | - |
606
+ | 0.7697 | 132000 | 0.0001 | - | - | - | - |
607
+ | 0.7756 | 133000 | 0.0001 | - | - | - | - |
608
+ | 0.7814 | 134000 | 0.0001 | - | - | - | - |
609
+ | 0.7872 | 135000 | 0.0001 | 0.0001 | -0.0067 | 0.6365 | - |
610
+ | 0.7931 | 136000 | 0.0001 | - | - | - | - |
611
+ | 0.7989 | 137000 | 0.0001 | - | - | - | - |
612
+ | 0.8047 | 138000 | 0.0001 | - | - | - | - |
613
+ | 0.8106 | 139000 | 0.0001 | - | - | - | - |
614
+ | 0.8164 | 140000 | 0.0001 | 0.0001 | -0.0066 | 0.6339 | - |
615
+ | 0.8222 | 141000 | 0.0001 | - | - | - | - |
616
+ | 0.8281 | 142000 | 0.0001 | - | - | - | - |
617
+ | 0.8339 | 143000 | 0.0001 | - | - | - | - |
618
+ | 0.8397 | 144000 | 0.0001 | - | - | - | - |
619
+ | 0.8456 | 145000 | 0.0001 | 0.0001 | -0.0066 | 0.6352 | - |
620
+ | 0.8514 | 146000 | 0.0001 | - | - | - | - |
621
+ | 0.8572 | 147000 | 0.0001 | - | - | - | - |
622
+ | 0.8630 | 148000 | 0.0001 | - | - | - | - |
623
+ | 0.8689 | 149000 | 0.0001 | - | - | - | - |
624
+ | 0.8747 | 150000 | 0.0001 | 0.0001 | -0.0065 | 0.6357 | - |
625
+ | 0.8805 | 151000 | 0.0001 | - | - | - | - |
626
+ | 0.8864 | 152000 | 0.0001 | - | - | - | - |
627
+ | 0.8922 | 153000 | 0.0001 | - | - | - | - |
628
+ | 0.8980 | 154000 | 0.0001 | - | - | - | - |
629
+ | 0.9039 | 155000 | 0.0001 | 0.0001 | -0.0065 | 0.6336 | - |
630
+ | 0.9097 | 156000 | 0.0001 | - | - | - | - |
631
+ | 0.9155 | 157000 | 0.0001 | - | - | - | - |
632
+ | 0.9214 | 158000 | 0.0001 | - | - | - | - |
633
+ | 0.9272 | 159000 | 0.0001 | - | - | - | - |
634
+ | 0.9330 | 160000 | 0.0001 | 0.0001 | -0.0064 | 0.6334 | - |
635
+ | 0.9389 | 161000 | 0.0001 | - | - | - | - |
636
+ | 0.9447 | 162000 | 0.0001 | - | - | - | - |
637
+ | 0.9505 | 163000 | 0.0001 | - | - | - | - |
638
+ | 0.9563 | 164000 | 0.0001 | - | - | - | - |
639
+ | 0.9622 | 165000 | 0.0001 | 0.0001 | -0.0064 | 0.6337 | - |
640
+ | 0.9680 | 166000 | 0.0001 | - | - | - | - |
641
+ | 0.9738 | 167000 | 0.0001 | - | - | - | - |
642
+ | 0.9797 | 168000 | 0.0001 | - | - | - | - |
643
+ | 0.9855 | 169000 | 0.0001 | - | - | - | - |
644
+ | 0.9913 | 170000 | 0.0001 | 0.0001 | -0.0063 | 0.6347 | - |
645
+ | 0.9972 | 171000 | 0.0001 | - | - | - | - |
646
+ | 1.0 | 171486 | - | - | - | - | 0.5986 |
647
+
648
+ * The bold row denotes the saved checkpoint.
649
+ </details>
650
+
651
+ ### Framework Versions
652
+ - Python: 3.10.14
653
+ - Sentence Transformers: 3.0.1
654
+ - Transformers: 4.44.0
655
+ - PyTorch: 2.4.0
656
+ - Accelerate: 0.33.0
657
+ - Datasets: 2.20.0
658
+ - Tokenizers: 0.19.1
659
+
660
+ ## Citation
661
+
662
+ ### BibTeX
663
+
664
+ #### Sentence Transformers
665
+ ```bibtex
666
+ @inproceedings{reimers-2019-sentence-bert,
667
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
668
+ author = "Reimers, Nils and Gurevych, Iryna",
669
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
670
+ month = "11",
671
+ year = "2019",
672
+ publisher = "Association for Computational Linguistics",
673
+ url = "https://arxiv.org/abs/1908.10084",
674
+ }
675
+ ```
676
+
677
+ #### MSELoss
678
+ ```bibtex
679
+ @inproceedings{reimers-2020-multilingual-sentence-bert,
680
+ title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
681
+ author = "Reimers, Nils and Gurevych, Iryna",
682
+ booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
683
+ month = "11",
684
+ year = "2020",
685
+ publisher = "Association for Computational Linguistics",
686
+ url = "https://arxiv.org/abs/2004.09813",
687
+ }
688
+ ```
689
+
690
+ <!--
691
+ ## Glossary
692
+
693
+ *Clearly define terms in order to be accessible across audiences.*
694
+ -->
695
+
696
+ <!--
697
+ ## Model Card Authors
698
+
699
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
700
+ -->
701
+
702
+ <!--
703
+ ## Model Card Contact
704
+
705
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
706
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "output/model-distillation-reduction-2024-08-19_09-02-08/final",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "directionality": "bidi",
9
+ "gradient_checkpointing": false,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-12,
16
+ "max_position_embeddings": 512,
17
+ "model_type": "bert",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 5,
20
+ "pad_token_id": 0,
21
+ "pooler_fc_size": 768,
22
+ "pooler_num_attention_heads": 12,
23
+ "pooler_num_fc_layers": 3,
24
+ "pooler_size_per_head": 128,
25
+ "pooler_type": "first_token_transform",
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.44.0",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 55083
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.44.0",
5
+ "pytorch": "2.4.0"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27a713701cd3b6060274d01cb82c813d64e79c373c41511fad1f206e1d27f7f8
3
+ size 314929664
modules.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Dense",
18
+ "type": "sentence_transformers.models.Dense"
19
+ },
20
+ {
21
+ "idx": 3,
22
+ "name": "3",
23
+ "path": "3_Normalize",
24
+ "type": "sentence_transformers.models.Normalize"
25
+ }
26
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": false,
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]"
64
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff