celik-muhammed commited on
Commit
545552b
1 Parent(s): c11cc95

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,624 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: []
3
+ library_name: sentence-transformers
4
+ tags:
5
+ - sentence-transformers
6
+ - sentence-similarity
7
+ - feature-extraction
8
+ - generated_from_trainer
9
+ - dataset_size:43371
10
+ - loss:MultipleNegativesRankingLoss
11
+ base_model: sentence-transformers/all-MiniLM-L6-v2
12
+ datasets: []
13
+ metrics:
14
+ - cosine_accuracy
15
+ - cosine_accuracy_threshold
16
+ - cosine_f1
17
+ - cosine_f1_threshold
18
+ - cosine_precision
19
+ - cosine_recall
20
+ - cosine_ap
21
+ - dot_accuracy
22
+ - dot_accuracy_threshold
23
+ - dot_f1
24
+ - dot_f1_threshold
25
+ - dot_precision
26
+ - dot_recall
27
+ - dot_ap
28
+ - manhattan_accuracy
29
+ - manhattan_accuracy_threshold
30
+ - manhattan_f1
31
+ - manhattan_f1_threshold
32
+ - manhattan_precision
33
+ - manhattan_recall
34
+ - manhattan_ap
35
+ - euclidean_accuracy
36
+ - euclidean_accuracy_threshold
37
+ - euclidean_f1
38
+ - euclidean_f1_threshold
39
+ - euclidean_precision
40
+ - euclidean_recall
41
+ - euclidean_ap
42
+ - max_accuracy
43
+ - max_accuracy_threshold
44
+ - max_f1
45
+ - max_f1_threshold
46
+ - max_precision
47
+ - max_recall
48
+ - max_ap
49
+ widget:
50
+ - source_sentence: ' New Kids on the Block: Step by Step (1990/I) Step closer to
51
+ the New Kids on the Block as they share their newest songs, their hottest performances,
52
+ and their most personal thoughts. Join the guys as they look at where they came
53
+ from, where they are right now, and where they''re headed - step by step.'
54
+ sentences:
55
+ - Rare
56
+ - Rare
57
+ - thriller
58
+ - source_sentence: ' "Vampirism Bites" (2010) Vampire fan girl Belle always dreamed
59
+ of becoming a vampire, and finally got her wish on a blind date. She quickly discovers
60
+ the life of a vampire is not what books, movies and TV have told her, and learns
61
+ that Vampirism is not a 24/7 sexual and romantic fantasy. In fact, Vampirism Bites.'
62
+ sentences:
63
+ - thriller
64
+ - comedy
65
+ - Rare
66
+ - source_sentence: ' O Candidato Vieira (2005) A feature documentary about satirical
67
+ rock star Manuel Joăo Vieira who ran as a candidate for the Presidency of Portugal
68
+ in 2001. Altough he didn''t collect the number of signatures needed to officially
69
+ put him on the ballots, Vieira''s surreal campaign appearances on television talk
70
+ shows, radio and concerts took the country by storm and left everybody laughing.
71
+ A political, comedic and musical documentary!'
72
+ sentences:
73
+ - documentary
74
+ - short
75
+ - short
76
+ - source_sentence: ' Ani DiFranco: Live at Babeville (2008) On September 11 and 12,
77
+ 2007, Ani DiFranco and her band (Allison Miller on drums, Todd Sickafoose on bass
78
+ and Mike Dillon on vibes and percussion) played two sold-out shows before a hometown
79
+ audience in Buffalo, New York. What made those nights so special wasn''t just
80
+ the music-that''s always special at an Ani show-but the fact that she was playing
81
+ the inaugural shows in her very own venue, "Babeville". Now the highlights of
82
+ the two shows are available on a single DVD featuring eighteen songs (two of which
83
+ have not yet appeared on studio albums), plus bonus sound check and interview
84
+ footage, all shot in high definition video and 5.1 surround sound. The result
85
+ is a must-have memento of Ani at her finest-onstage, playing her guitar and singing
86
+ with the passion, intensity, and joy that have made her a legend.'
87
+ sentences:
88
+ - drama
89
+ - Rare
90
+ - documentary
91
+ - source_sentence: ' "Oliver Twist" (1985) In a storm, in a workhouse, to a nameless
92
+ woman, young Oliver Twist is born into parish care where he''s overworked and
93
+ underfed. As he grows older his adventures take him from the countryside to London,
94
+ through harsh treatment, kindness, an undertaker, and a thieves'' dens, where
95
+ he makes friends and enemies. But all the time he is pursued by the mysterious
96
+ Monks, who hires Fagin to turn Oliver into a thief. Oliver is rescued by chance
97
+ and kind friends. But it''s a puzzle of legitimacy, inheritance, and identity
98
+ that Oliver''s friends must attempt to unravel before Monks can destroy Oliver.'
99
+ sentences:
100
+ - documentary
101
+ - drama
102
+ - drama
103
+ pipeline_tag: sentence-similarity
104
+ model-index:
105
+ - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
106
+ results:
107
+ - task:
108
+ type: binary-classification
109
+ name: Binary Classification
110
+ dataset:
111
+ name: Unknown
112
+ type: unknown
113
+ metrics:
114
+ - type: cosine_accuracy
115
+ value: 0.900683492678328
116
+ name: Cosine Accuracy
117
+ - type: cosine_accuracy_threshold
118
+ value: 0.601991593837738
119
+ name: Cosine Accuracy Threshold
120
+ - type: cosine_f1
121
+ value: 0.4642871879513101
122
+ name: Cosine F1
123
+ - type: cosine_f1_threshold
124
+ value: 0.520057201385498
125
+ name: Cosine F1 Threshold
126
+ - type: cosine_precision
127
+ value: 0.4201015531660693
128
+ name: Cosine Precision
129
+ - type: cosine_recall
130
+ value: 0.5188600940699069
131
+ name: Cosine Recall
132
+ - type: cosine_ap
133
+ value: 0.46368250557502916
134
+ name: Cosine Ap
135
+ - type: dot_accuracy
136
+ value: 0.900683492678328
137
+ name: Dot Accuracy
138
+ - type: dot_accuracy_threshold
139
+ value: 0.6019916534423828
140
+ name: Dot Accuracy Threshold
141
+ - type: dot_f1
142
+ value: 0.4642871879513101
143
+ name: Dot F1
144
+ - type: dot_f1_threshold
145
+ value: 0.5200573205947876
146
+ name: Dot F1 Threshold
147
+ - type: dot_precision
148
+ value: 0.4201015531660693
149
+ name: Dot Precision
150
+ - type: dot_recall
151
+ value: 0.5188600940699069
152
+ name: Dot Recall
153
+ - type: dot_ap
154
+ value: 0.4636826492476884
155
+ name: Dot Ap
156
+ - type: manhattan_accuracy
157
+ value: 0.900304343816287
158
+ name: Manhattan Accuracy
159
+ - type: manhattan_accuracy_threshold
160
+ value: 13.547416687011719
161
+ name: Manhattan Accuracy Threshold
162
+ - type: manhattan_f1
163
+ value: 0.45818772856562373
164
+ name: Manhattan F1
165
+ - type: manhattan_f1_threshold
166
+ value: 15.149662017822266
167
+ name: Manhattan F1 Threshold
168
+ - type: manhattan_precision
169
+ value: 0.40953003559235857
170
+ name: Manhattan Precision
171
+ - type: manhattan_recall
172
+ value: 0.5199667988564051
173
+ name: Manhattan Recall
174
+ - type: manhattan_ap
175
+ value: 0.45787992811626
176
+ name: Manhattan Ap
177
+ - type: euclidean_accuracy
178
+ value: 0.900683492678328
179
+ name: Euclidean Accuracy
180
+ - type: euclidean_accuracy_threshold
181
+ value: 0.8921977281570435
182
+ name: Euclidean Accuracy Threshold
183
+ - type: euclidean_f1
184
+ value: 0.4642871879513101
185
+ name: Euclidean F1
186
+ - type: euclidean_f1_threshold
187
+ value: 0.979737401008606
188
+ name: Euclidean F1 Threshold
189
+ - type: euclidean_precision
190
+ value: 0.4201015531660693
191
+ name: Euclidean Precision
192
+ - type: euclidean_recall
193
+ value: 0.5188600940699069
194
+ name: Euclidean Recall
195
+ - type: euclidean_ap
196
+ value: 0.46368245984449313
197
+ name: Euclidean Ap
198
+ - type: max_accuracy
199
+ value: 0.900683492678328
200
+ name: Max Accuracy
201
+ - type: max_accuracy_threshold
202
+ value: 13.547416687011719
203
+ name: Max Accuracy Threshold
204
+ - type: max_f1
205
+ value: 0.4642871879513101
206
+ name: Max F1
207
+ - type: max_f1_threshold
208
+ value: 15.149662017822266
209
+ name: Max F1 Threshold
210
+ - type: max_precision
211
+ value: 0.4201015531660693
212
+ name: Max Precision
213
+ - type: max_recall
214
+ value: 0.5199667988564051
215
+ name: Max Recall
216
+ - type: max_ap
217
+ value: 0.4636826492476884
218
+ name: Max Ap
219
+ - task:
220
+ type: triplet
221
+ name: Triplet
222
+ dataset:
223
+ name: Unknown
224
+ type: unknown
225
+ metrics:
226
+ - type: cosine_accuracy
227
+ value: 0.6381767038642442
228
+ name: Cosine Accuracy
229
+ - type: dot_accuracy
230
+ value: 0.3618232961357558
231
+ name: Dot Accuracy
232
+ - type: manhattan_accuracy
233
+ value: 0.6227289495527069
234
+ name: Manhattan Accuracy
235
+ - type: euclidean_accuracy
236
+ value: 0.6381767038642442
237
+ name: Euclidean Accuracy
238
+ - type: max_accuracy
239
+ value: 0.6381767038642442
240
+ name: Max Accuracy
241
+ ---
242
+
243
+ # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
244
+
245
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) on the imdb-triplet dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
246
+
247
+ ## Model Details
248
+
249
+ ### Model Description
250
+ - **Model Type:** Sentence Transformer
251
+ - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision 8b3219a92973c328a8e22fadcfa821b5dc75636a -->
252
+ - **Maximum Sequence Length:** 256 tokens
253
+ - **Output Dimensionality:** 384 tokens
254
+ - **Similarity Function:** Cosine Similarity
255
+ - **Training Dataset:**
256
+ - imdb-triplet
257
+ <!-- - **Language:** Unknown -->
258
+ <!-- - **License:** Unknown -->
259
+
260
+ ### Model Sources
261
+
262
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
263
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
264
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
265
+
266
+ ### Full Model Architecture
267
+
268
+ ```
269
+ SentenceTransformer(
270
+ (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
271
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
272
+ (2): Normalize()
273
+ )
274
+ ```
275
+
276
+ ## Usage
277
+
278
+ ### Direct Usage (Sentence Transformers)
279
+
280
+ First install the Sentence Transformers library:
281
+
282
+ ```bash
283
+ pip install -U sentence-transformers
284
+ ```
285
+
286
+ Then you can load this model and run inference.
287
+ ```python
288
+ from sentence_transformers import SentenceTransformer
289
+
290
+ # Download from the 🤗 Hub
291
+ model = SentenceTransformer("celik-muhammed/all-MiniLM-L6-v2-finetuned-imdb")
292
+ # Run inference
293
+ sentences = [
294
+ ' "Oliver Twist" (1985) In a storm, in a workhouse, to a nameless woman, young Oliver Twist is born into parish care where he\'s overworked and underfed. As he grows older his adventures take him from the countryside to London, through harsh treatment, kindness, an undertaker, and a thieves\' dens, where he makes friends and enemies. But all the time he is pursued by the mysterious Monks, who hires Fagin to turn Oliver into a thief. Oliver is rescued by chance and kind friends. But it\'s a puzzle of legitimacy, inheritance, and identity that Oliver\'s friends must attempt to unravel before Monks can destroy Oliver.',
295
+ 'drama',
296
+ 'documentary',
297
+ ]
298
+ embeddings = model.encode(sentences)
299
+ print(embeddings.shape)
300
+ # [3, 384]
301
+
302
+ # Get the similarity scores for the embeddings
303
+ similarities = model.similarity(embeddings, embeddings)
304
+ print(similarities.shape)
305
+ # [3, 3]
306
+ ```
307
+
308
+ <!--
309
+ ### Direct Usage (Transformers)
310
+
311
+ <details><summary>Click to see the direct usage in Transformers</summary>
312
+
313
+ </details>
314
+ -->
315
+
316
+ <!--
317
+ ### Downstream Usage (Sentence Transformers)
318
+
319
+ You can finetune this model on your own dataset.
320
+
321
+ <details><summary>Click to expand</summary>
322
+
323
+ </details>
324
+ -->
325
+
326
+ <!--
327
+ ### Out-of-Scope Use
328
+
329
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
330
+ -->
331
+
332
+ ## Evaluation
333
+
334
+ ### Metrics
335
+
336
+ #### Binary Classification
337
+
338
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
339
+
340
+ | Metric | Value |
341
+ |:-----------------------------|:-----------|
342
+ | cosine_accuracy | 0.9007 |
343
+ | cosine_accuracy_threshold | 0.602 |
344
+ | cosine_f1 | 0.4643 |
345
+ | cosine_f1_threshold | 0.5201 |
346
+ | cosine_precision | 0.4201 |
347
+ | cosine_recall | 0.5189 |
348
+ | cosine_ap | 0.4637 |
349
+ | dot_accuracy | 0.9007 |
350
+ | dot_accuracy_threshold | 0.602 |
351
+ | dot_f1 | 0.4643 |
352
+ | dot_f1_threshold | 0.5201 |
353
+ | dot_precision | 0.4201 |
354
+ | dot_recall | 0.5189 |
355
+ | dot_ap | 0.4637 |
356
+ | manhattan_accuracy | 0.9003 |
357
+ | manhattan_accuracy_threshold | 13.5474 |
358
+ | manhattan_f1 | 0.4582 |
359
+ | manhattan_f1_threshold | 15.1497 |
360
+ | manhattan_precision | 0.4095 |
361
+ | manhattan_recall | 0.52 |
362
+ | manhattan_ap | 0.4579 |
363
+ | euclidean_accuracy | 0.9007 |
364
+ | euclidean_accuracy_threshold | 0.8922 |
365
+ | euclidean_f1 | 0.4643 |
366
+ | euclidean_f1_threshold | 0.9797 |
367
+ | euclidean_precision | 0.4201 |
368
+ | euclidean_recall | 0.5189 |
369
+ | euclidean_ap | 0.4637 |
370
+ | max_accuracy | 0.9007 |
371
+ | max_accuracy_threshold | 13.5474 |
372
+ | max_f1 | 0.4643 |
373
+ | max_f1_threshold | 15.1497 |
374
+ | max_precision | 0.4201 |
375
+ | max_recall | 0.52 |
376
+ | **max_ap** | **0.4637** |
377
+
378
+ #### Triplet
379
+
380
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
381
+
382
+ | Metric | Value |
383
+ |:-------------------|:-----------|
384
+ | cosine_accuracy | 0.6382 |
385
+ | dot_accuracy | 0.3618 |
386
+ | manhattan_accuracy | 0.6227 |
387
+ | euclidean_accuracy | 0.6382 |
388
+ | **max_accuracy** | **0.6382** |
389
+
390
+ <!--
391
+ ## Bias, Risks and Limitations
392
+
393
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
394
+ -->
395
+
396
+ <!--
397
+ ### Recommendations
398
+
399
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
400
+ -->
401
+
402
+ ## Training Details
403
+
404
+ ### Training Dataset
405
+
406
+ #### imdb-triplet
407
+
408
+ * Dataset: imdb-triplet
409
+ * Size: 43,371 training samples
410
+ * Columns: <code>anchor</code> and <code>positive</code>
411
+ * Approximate statistics based on the first 1000 samples:
412
+ | | anchor | positive |
413
+ |:--------|:-------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------|
414
+ | type | string | string |
415
+ | details | <ul><li>min: 31 tokens</li><li>mean: 129.65 tokens</li><li>max: 256 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 3.0 tokens</li><li>max: 3 tokens</li></ul> |
416
+ * Samples:
417
+ | anchor | positive |
418
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------|
419
+ | <code> A Metafísica dos Chocolates (1967) Beautiful girls (pre-teens, adolescents, and young women) in street scenes and one of them visiting a chocolate factory, where all the workers are young women, too. A poetic text and an extract from a major Portuguese poet, convey to us the sensual feeling of choosing, unwrapping, and munching chocolate.</code> | <code>short</code> |
420
+ | <code> Thai Jashe! (2016) Thai Jashe! is an upcoming Gujarati film written and directed by Nirav Barot. It is about the struggles of a middle class man to achieve his goals in the metro-city Ahmedabad. The film stars Manoj Joshi, Malhar Thakar and Monal Gajjar.</code> | <code>drama</code> |
421
+ | <code> Vuelco (2005) A teenage boy rides out of town to meet a a girl in the countryside. She is deaf, and he explains the different means he uses to get her attention when she has not seen him. Then they say goodbye, with one poignant hug and a desperate yell punctuating their final farewell.</code> | <code>short</code> |
422
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
423
+ ```json
424
+ {
425
+ "scale": 20.0,
426
+ "similarity_fct": "cos_sim"
427
+ }
428
+ ```
429
+
430
+ ### Training Hyperparameters
431
+ #### Non-Default Hyperparameters
432
+
433
+ - `eval_strategy`: steps
434
+ - `per_device_train_batch_size`: 256
435
+ - `per_device_eval_batch_size`: 256
436
+ - `num_train_epochs`: 5
437
+ - `warmup_ratio`: 0.1
438
+ - `fp16`: True
439
+ - `batch_sampler`: no_duplicates
440
+ - `multi_dataset_batch_sampler`: round_robin
441
+
442
+ #### All Hyperparameters
443
+ <details><summary>Click to expand</summary>
444
+
445
+ - `overwrite_output_dir`: False
446
+ - `do_predict`: False
447
+ - `eval_strategy`: steps
448
+ - `prediction_loss_only`: True
449
+ - `per_device_train_batch_size`: 256
450
+ - `per_device_eval_batch_size`: 256
451
+ - `per_gpu_train_batch_size`: None
452
+ - `per_gpu_eval_batch_size`: None
453
+ - `gradient_accumulation_steps`: 1
454
+ - `eval_accumulation_steps`: None
455
+ - `learning_rate`: 5e-05
456
+ - `weight_decay`: 0.0
457
+ - `adam_beta1`: 0.9
458
+ - `adam_beta2`: 0.999
459
+ - `adam_epsilon`: 1e-08
460
+ - `max_grad_norm`: 1.0
461
+ - `num_train_epochs`: 5
462
+ - `max_steps`: -1
463
+ - `lr_scheduler_type`: linear
464
+ - `lr_scheduler_kwargs`: {}
465
+ - `warmup_ratio`: 0.1
466
+ - `warmup_steps`: 0
467
+ - `log_level`: passive
468
+ - `log_level_replica`: warning
469
+ - `log_on_each_node`: True
470
+ - `logging_nan_inf_filter`: True
471
+ - `save_safetensors`: True
472
+ - `save_on_each_node`: False
473
+ - `save_only_model`: False
474
+ - `restore_callback_states_from_checkpoint`: False
475
+ - `no_cuda`: False
476
+ - `use_cpu`: False
477
+ - `use_mps_device`: False
478
+ - `seed`: 42
479
+ - `data_seed`: None
480
+ - `jit_mode_eval`: False
481
+ - `use_ipex`: False
482
+ - `bf16`: False
483
+ - `fp16`: True
484
+ - `fp16_opt_level`: O1
485
+ - `half_precision_backend`: auto
486
+ - `bf16_full_eval`: False
487
+ - `fp16_full_eval`: False
488
+ - `tf32`: None
489
+ - `local_rank`: 0
490
+ - `ddp_backend`: None
491
+ - `tpu_num_cores`: None
492
+ - `tpu_metrics_debug`: False
493
+ - `debug`: []
494
+ - `dataloader_drop_last`: False
495
+ - `dataloader_num_workers`: 0
496
+ - `dataloader_prefetch_factor`: None
497
+ - `past_index`: -1
498
+ - `disable_tqdm`: False
499
+ - `remove_unused_columns`: True
500
+ - `label_names`: None
501
+ - `load_best_model_at_end`: False
502
+ - `ignore_data_skip`: False
503
+ - `fsdp`: []
504
+ - `fsdp_min_num_params`: 0
505
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
506
+ - `fsdp_transformer_layer_cls_to_wrap`: None
507
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
508
+ - `deepspeed`: None
509
+ - `label_smoothing_factor`: 0.0
510
+ - `optim`: adamw_torch
511
+ - `optim_args`: None
512
+ - `adafactor`: False
513
+ - `group_by_length`: False
514
+ - `length_column_name`: length
515
+ - `ddp_find_unused_parameters`: None
516
+ - `ddp_bucket_cap_mb`: None
517
+ - `ddp_broadcast_buffers`: False
518
+ - `dataloader_pin_memory`: True
519
+ - `dataloader_persistent_workers`: False
520
+ - `skip_memory_metrics`: True
521
+ - `use_legacy_prediction_loop`: False
522
+ - `push_to_hub`: False
523
+ - `resume_from_checkpoint`: None
524
+ - `hub_model_id`: None
525
+ - `hub_strategy`: every_save
526
+ - `hub_private_repo`: False
527
+ - `hub_always_push`: False
528
+ - `gradient_checkpointing`: False
529
+ - `gradient_checkpointing_kwargs`: None
530
+ - `include_inputs_for_metrics`: False
531
+ - `eval_do_concat_batches`: True
532
+ - `fp16_backend`: auto
533
+ - `push_to_hub_model_id`: None
534
+ - `push_to_hub_organization`: None
535
+ - `mp_parameters`:
536
+ - `auto_find_batch_size`: False
537
+ - `full_determinism`: False
538
+ - `torchdynamo`: None
539
+ - `ray_scope`: last
540
+ - `ddp_timeout`: 1800
541
+ - `torch_compile`: False
542
+ - `torch_compile_backend`: None
543
+ - `torch_compile_mode`: None
544
+ - `dispatch_batches`: None
545
+ - `split_batches`: None
546
+ - `include_tokens_per_second`: False
547
+ - `include_num_input_tokens_seen`: False
548
+ - `neftune_noise_alpha`: None
549
+ - `optim_target_modules`: None
550
+ - `batch_eval_metrics`: False
551
+ - `batch_sampler`: no_duplicates
552
+ - `multi_dataset_batch_sampler`: round_robin
553
+
554
+ </details>
555
+
556
+ ### Training Logs
557
+ | Epoch | Step | Training Loss | max_accuracy | max_ap |
558
+ |:------:|:----:|:-------------:|:------------:|:------:|
559
+ | 0 | 0 | - | 0.6382 | 0.2004 |
560
+ | 0.5882 | 100 | 1.7867 | - | 0.3542 |
561
+ | 1.1765 | 200 | 1.3073 | - | 0.4564 |
562
+ | 1.7647 | 300 | 1.266 | - | 0.3862 |
563
+ | 2.3529 | 400 | 1.1889 | - | 0.4011 |
564
+ | 2.9412 | 500 | 1.1554 | - | 0.4398 |
565
+ | 3.5294 | 600 | 1.1558 | - | 0.4386 |
566
+ | 4.1176 | 700 | 1.1555 | - | 0.4566 |
567
+ | 4.7059 | 800 | 1.0835 | - | 0.4637 |
568
+
569
+
570
+ ### Framework Versions
571
+ - Python: 3.10.13
572
+ - Sentence Transformers: 3.0.1
573
+ - Transformers: 4.41.2
574
+ - PyTorch: 2.1.2
575
+ - Accelerate: 0.30.1
576
+ - Datasets: 2.19.2
577
+ - Tokenizers: 0.19.1
578
+
579
+ ## Citation
580
+
581
+ ### BibTeX
582
+
583
+ #### Sentence Transformers
584
+ ```bibtex
585
+ @inproceedings{reimers-2019-sentence-bert,
586
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
587
+ author = "Reimers, Nils and Gurevych, Iryna",
588
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
589
+ month = "11",
590
+ year = "2019",
591
+ publisher = "Association for Computational Linguistics",
592
+ url = "https://arxiv.org/abs/1908.10084",
593
+ }
594
+ ```
595
+
596
+ #### MultipleNegativesRankingLoss
597
+ ```bibtex
598
+ @misc{henderson2017efficient,
599
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
600
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
601
+ year={2017},
602
+ eprint={1705.00652},
603
+ archivePrefix={arXiv},
604
+ primaryClass={cs.CL}
605
+ }
606
+ ```
607
+
608
+ <!--
609
+ ## Glossary
610
+
611
+ *Clearly define terms in order to be accessible across audiences.*
612
+ -->
613
+
614
+ <!--
615
+ ## Model Card Authors
616
+
617
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
618
+ -->
619
+
620
+ <!--
621
+ ## Model Card Contact
622
+
623
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
624
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/all-MiniLM-L6-v2",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 384,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 6,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.41.2",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.1.2"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1bd9ffe053c5acb0d586d8437beae28cfeb0a4d5401dfafb98e67a434d16b04d
3
+ size 90864192
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 256,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "max_length": 128,
50
+ "model_max_length": 256,
51
+ "never_split": null,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]"
64
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff