uhoffmann commited on
Commit
d5d0eb6
1 Parent(s): b0ff0ac

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,821 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ datasets: []
4
+ language:
5
+ - en
6
+ library_name: sentence-transformers
7
+ license: apache-2.0
8
+ metrics:
9
+ - cosine_accuracy@1
10
+ - cosine_accuracy@3
11
+ - cosine_accuracy@5
12
+ - cosine_accuracy@10
13
+ - cosine_precision@1
14
+ - cosine_precision@3
15
+ - cosine_precision@5
16
+ - cosine_precision@10
17
+ - cosine_recall@1
18
+ - cosine_recall@3
19
+ - cosine_recall@5
20
+ - cosine_recall@10
21
+ - cosine_ndcg@10
22
+ - cosine_mrr@10
23
+ - cosine_map@100
24
+ pipeline_tag: sentence-similarity
25
+ tags:
26
+ - sentence-transformers
27
+ - sentence-similarity
28
+ - feature-extraction
29
+ - generated_from_trainer
30
+ - dataset_size:6300
31
+ - loss:MatryoshkaLoss
32
+ - loss:MultipleNegativesRankingLoss
33
+ widget:
34
+ - source_sentence: Mergers and acquisitions, joint ventures and strategic investments
35
+ complement our internal development and enhance our partnerships to align with
36
+ Visa’s priorities.
37
+ sentences:
38
+ - How much did the unbilled accounts receivable amount to as of December 30, 2023?
39
+ - What was the main reason for Visa to engage in mergers and acquisitions, joint
40
+ ventures, and strategic investments?
41
+ - What is the mission of Intuit?
42
+ - source_sentence: Garmin’s audio brands, Fusion and JL Audio, offer premium audio
43
+ products and accessories, including head units, speakers, amplifiers, subwoofers,
44
+ and other audio components. These products are designed specifically for the marine,
45
+ powersports, aftermarket automotive, home, or RV environments, offering premium
46
+ sound quality and supporting many connectivity options for integrating with MFDs,
47
+ smartphones, and Garmin wearables.
48
+ sentences:
49
+ - What type of insurance policies cover some of the defense and settlement costs
50
+ associated with litigation mentioned?
51
+ - What types of audio products does Garmin's Fusion and JL Audio brands offer?
52
+ - What should investors consider when comparing Adjusted EBITDA across different
53
+ companies?
54
+ - source_sentence: Medical device products that are marketed in the European Union
55
+ must comply with the requirements of the Medical Device Regulation (the MDR),
56
+ which came into effect in May 2021. The MDR provides for regulatory oversight
57
+ with respect to the design, manufacture, clinical trials, labeling and adverse
58
+ event reporting for medical devices.
59
+ sentences:
60
+ - What are the requirements for medical devices to be marketed in the European Union
61
+ under the MDR?
62
+ - By what percentage did the pre-tax earnings increase from 2021 to 2022 in the
63
+ manufacturing sector?
64
+ - What were the cash and cash equivalents at the end of 2023?
65
+ - source_sentence: In March 2023, the Board of Directors sanctioned a restructuring
66
+ plan concentrated on investment prioritization towards significant growth prospects
67
+ and the optimization of the company's real estate assets. This includes substantial
68
+ organizational changes such as reductions in office space and workforce.
69
+ sentences:
70
+ - How many physicians are part of the domestic Office of the Chief Medical Officer
71
+ at DaVita as of December 31, 2023?
72
+ - What changes in expenses did Delta Air Lines' ancillary businesses and refinery
73
+ segment encounter in 2023 compared to 2022?
74
+ - What are the restructuring targets of the company's Board of Directors as of 2023?
75
+ - source_sentence: The quality of GM dealerships and our relationship with our dealers
76
+ are critical to our success, now, and as we transition to our all-electric future,
77
+ given that they maintain the primary sales and service interface with the end
78
+ consumer of our products. In addition to the terms of our contracts with our dealers,
79
+ we are regulated by various country and state franchise laws and regulations that
80
+ may supersede those contractual terms and impose specific regulatory
81
+ sentences:
82
+ - How does General[39 chars] Motors ensure quality in their dealership network?
83
+ - How can the public access the company's financial and legal reports?
84
+ - Is the outcome of the investigation into Tesla's waste segregation practices currently
85
+ determinable?
86
+ model-index:
87
+ - name: BGE base Financial Matryoshka
88
+ results:
89
+ - task:
90
+ type: information-retrieval
91
+ name: Information Retrieval
92
+ dataset:
93
+ name: dim 768
94
+ type: dim_768
95
+ metrics:
96
+ - type: cosine_accuracy@1
97
+ value: 0.6785714285714286
98
+ name: Cosine Accuracy@1
99
+ - type: cosine_accuracy@3
100
+ value: 0.8171428571428572
101
+ name: Cosine Accuracy@3
102
+ - type: cosine_accuracy@5
103
+ value: 0.8671428571428571
104
+ name: Cosine Accuracy@5
105
+ - type: cosine_accuracy@10
106
+ value: 0.91
107
+ name: Cosine Accuracy@10
108
+ - type: cosine_precision@1
109
+ value: 0.6785714285714286
110
+ name: Cosine Precision@1
111
+ - type: cosine_precision@3
112
+ value: 0.2723809523809524
113
+ name: Cosine Precision@3
114
+ - type: cosine_precision@5
115
+ value: 0.1734285714285714
116
+ name: Cosine Precision@5
117
+ - type: cosine_precision@10
118
+ value: 0.09099999999999998
119
+ name: Cosine Precision@10
120
+ - type: cosine_recall@1
121
+ value: 0.6785714285714286
122
+ name: Cosine Recall@1
123
+ - type: cosine_recall@3
124
+ value: 0.8171428571428572
125
+ name: Cosine Recall@3
126
+ - type: cosine_recall@5
127
+ value: 0.8671428571428571
128
+ name: Cosine Recall@5
129
+ - type: cosine_recall@10
130
+ value: 0.91
131
+ name: Cosine Recall@10
132
+ - type: cosine_ndcg@10
133
+ value: 0.7949318413045188
134
+ name: Cosine Ndcg@10
135
+ - type: cosine_mrr@10
136
+ value: 0.7579920634920636
137
+ name: Cosine Mrr@10
138
+ - type: cosine_map@100
139
+ value: 0.761780829563342
140
+ name: Cosine Map@100
141
+ - task:
142
+ type: information-retrieval
143
+ name: Information Retrieval
144
+ dataset:
145
+ name: dim 512
146
+ type: dim_512
147
+ metrics:
148
+ - type: cosine_accuracy@1
149
+ value: 0.6714285714285714
150
+ name: Cosine Accuracy@1
151
+ - type: cosine_accuracy@3
152
+ value: 0.8171428571428572
153
+ name: Cosine Accuracy@3
154
+ - type: cosine_accuracy@5
155
+ value: 0.8642857142857143
156
+ name: Cosine Accuracy@5
157
+ - type: cosine_accuracy@10
158
+ value: 0.9028571428571428
159
+ name: Cosine Accuracy@10
160
+ - type: cosine_precision@1
161
+ value: 0.6714285714285714
162
+ name: Cosine Precision@1
163
+ - type: cosine_precision@3
164
+ value: 0.2723809523809524
165
+ name: Cosine Precision@3
166
+ - type: cosine_precision@5
167
+ value: 0.17285714285714285
168
+ name: Cosine Precision@5
169
+ - type: cosine_precision@10
170
+ value: 0.09028571428571427
171
+ name: Cosine Precision@10
172
+ - type: cosine_recall@1
173
+ value: 0.6714285714285714
174
+ name: Cosine Recall@1
175
+ - type: cosine_recall@3
176
+ value: 0.8171428571428572
177
+ name: Cosine Recall@3
178
+ - type: cosine_recall@5
179
+ value: 0.8642857142857143
180
+ name: Cosine Recall@5
181
+ - type: cosine_recall@10
182
+ value: 0.9028571428571428
183
+ name: Cosine Recall@10
184
+ - type: cosine_ndcg@10
185
+ value: 0.7892232861723367
186
+ name: Cosine Ndcg@10
187
+ - type: cosine_mrr@10
188
+ value: 0.7524767573696142
189
+ name: Cosine Mrr@10
190
+ - type: cosine_map@100
191
+ value: 0.7566816338836445
192
+ name: Cosine Map@100
193
+ - task:
194
+ type: information-retrieval
195
+ name: Information Retrieval
196
+ dataset:
197
+ name: dim 256
198
+ type: dim_256
199
+ metrics:
200
+ - type: cosine_accuracy@1
201
+ value: 0.6671428571428571
202
+ name: Cosine Accuracy@1
203
+ - type: cosine_accuracy@3
204
+ value: 0.8142857142857143
205
+ name: Cosine Accuracy@3
206
+ - type: cosine_accuracy@5
207
+ value: 0.8657142857142858
208
+ name: Cosine Accuracy@5
209
+ - type: cosine_accuracy@10
210
+ value: 0.9028571428571428
211
+ name: Cosine Accuracy@10
212
+ - type: cosine_precision@1
213
+ value: 0.6671428571428571
214
+ name: Cosine Precision@1
215
+ - type: cosine_precision@3
216
+ value: 0.2714285714285714
217
+ name: Cosine Precision@3
218
+ - type: cosine_precision@5
219
+ value: 0.17314285714285713
220
+ name: Cosine Precision@5
221
+ - type: cosine_precision@10
222
+ value: 0.09028571428571427
223
+ name: Cosine Precision@10
224
+ - type: cosine_recall@1
225
+ value: 0.6671428571428571
226
+ name: Cosine Recall@1
227
+ - type: cosine_recall@3
228
+ value: 0.8142857142857143
229
+ name: Cosine Recall@3
230
+ - type: cosine_recall@5
231
+ value: 0.8657142857142858
232
+ name: Cosine Recall@5
233
+ - type: cosine_recall@10
234
+ value: 0.9028571428571428
235
+ name: Cosine Recall@10
236
+ - type: cosine_ndcg@10
237
+ value: 0.786715703830093
238
+ name: Cosine Ndcg@10
239
+ - type: cosine_mrr@10
240
+ value: 0.749225056689342
241
+ name: Cosine Mrr@10
242
+ - type: cosine_map@100
243
+ value: 0.7532686203724872
244
+ name: Cosine Map@100
245
+ - task:
246
+ type: information-retrieval
247
+ name: Information Retrieval
248
+ dataset:
249
+ name: dim 128
250
+ type: dim_128
251
+ metrics:
252
+ - type: cosine_accuracy@1
253
+ value: 0.6542857142857142
254
+ name: Cosine Accuracy@1
255
+ - type: cosine_accuracy@3
256
+ value: 0.8071428571428572
257
+ name: Cosine Accuracy@3
258
+ - type: cosine_accuracy@5
259
+ value: 0.8428571428571429
260
+ name: Cosine Accuracy@5
261
+ - type: cosine_accuracy@10
262
+ value: 0.9
263
+ name: Cosine Accuracy@10
264
+ - type: cosine_precision@1
265
+ value: 0.6542857142857142
266
+ name: Cosine Precision@1
267
+ - type: cosine_precision@3
268
+ value: 0.26904761904761904
269
+ name: Cosine Precision@3
270
+ - type: cosine_precision@5
271
+ value: 0.16857142857142854
272
+ name: Cosine Precision@5
273
+ - type: cosine_precision@10
274
+ value: 0.09
275
+ name: Cosine Precision@10
276
+ - type: cosine_recall@1
277
+ value: 0.6542857142857142
278
+ name: Cosine Recall@1
279
+ - type: cosine_recall@3
280
+ value: 0.8071428571428572
281
+ name: Cosine Recall@3
282
+ - type: cosine_recall@5
283
+ value: 0.8428571428571429
284
+ name: Cosine Recall@5
285
+ - type: cosine_recall@10
286
+ value: 0.9
287
+ name: Cosine Recall@10
288
+ - type: cosine_ndcg@10
289
+ value: 0.7763972670750712
290
+ name: Cosine Ndcg@10
291
+ - type: cosine_mrr@10
292
+ value: 0.7369308390022671
293
+ name: Cosine Mrr@10
294
+ - type: cosine_map@100
295
+ value: 0.7407041984815913
296
+ name: Cosine Map@100
297
+ - task:
298
+ type: information-retrieval
299
+ name: Information Retrieval
300
+ dataset:
301
+ name: dim 64
302
+ type: dim_64
303
+ metrics:
304
+ - type: cosine_accuracy@1
305
+ value: 0.62
306
+ name: Cosine Accuracy@1
307
+ - type: cosine_accuracy@3
308
+ value: 0.7671428571428571
309
+ name: Cosine Accuracy@3
310
+ - type: cosine_accuracy@5
311
+ value: 0.8171428571428572
312
+ name: Cosine Accuracy@5
313
+ - type: cosine_accuracy@10
314
+ value: 0.8785714285714286
315
+ name: Cosine Accuracy@10
316
+ - type: cosine_precision@1
317
+ value: 0.62
318
+ name: Cosine Precision@1
319
+ - type: cosine_precision@3
320
+ value: 0.2557142857142857
321
+ name: Cosine Precision@3
322
+ - type: cosine_precision@5
323
+ value: 0.16342857142857142
324
+ name: Cosine Precision@5
325
+ - type: cosine_precision@10
326
+ value: 0.08785714285714284
327
+ name: Cosine Precision@10
328
+ - type: cosine_recall@1
329
+ value: 0.62
330
+ name: Cosine Recall@1
331
+ - type: cosine_recall@3
332
+ value: 0.7671428571428571
333
+ name: Cosine Recall@3
334
+ - type: cosine_recall@5
335
+ value: 0.8171428571428572
336
+ name: Cosine Recall@5
337
+ - type: cosine_recall@10
338
+ value: 0.8785714285714286
339
+ name: Cosine Recall@10
340
+ - type: cosine_ndcg@10
341
+ value: 0.7482796784963641
342
+ name: Cosine Ndcg@10
343
+ - type: cosine_mrr@10
344
+ value: 0.7067517006802718
345
+ name: Cosine Mrr@10
346
+ - type: cosine_map@100
347
+ value: 0.7110201251131743
348
+ name: Cosine Map@100
349
+ ---
350
+
351
+ # BGE base Financial Matryoshka
352
+
353
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
354
+
355
+ ## Model Details
356
+
357
+ ### Model Description
358
+ - **Model Type:** Sentence Transformer
359
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
360
+ - **Maximum Sequence Length:** 512 tokens
361
+ - **Output Dimensionality:** 768 tokens
362
+ - **Similarity Function:** Cosine Similarity
363
+ <!-- - **Training Dataset:** Unknown -->
364
+ - **Language:** en
365
+ - **License:** apache-2.0
366
+
367
+ ### Model Sources
368
+
369
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
370
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
371
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
372
+
373
+ ### Full Model Architecture
374
+
375
+ ```
376
+ SentenceTransformer(
377
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
378
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
379
+ (2): Normalize()
380
+ )
381
+ ```
382
+
383
+ ## Usage
384
+
385
+ ### Direct Usage (Sentence Transformers)
386
+
387
+ First install the Sentence Transformers library:
388
+
389
+ ```bash
390
+ pip install -U sentence-transformers
391
+ ```
392
+
393
+ Then you can load this model and run inference.
394
+ ```python
395
+ from sentence_transformers import SentenceTransformer
396
+
397
+ # Download from the 🤗 Hub
398
+ model = SentenceTransformer("uhoffmann/bge-base-financial-matryoshka")
399
+ # Run inference
400
+ sentences = [
401
+ 'The quality of GM dealerships and our relationship with our dealers are critical to our success, now, and as we transition to our all-electric future, given that they maintain the primary sales and service interface with the end consumer of our products. In addition to the terms of our contracts with our dealers, we are regulated by various country and state franchise laws and regulations that may supersede those contractual terms and impose specific regulatory',
402
+ 'How does General[39 chars] Motors ensure quality in their dealership network?',
403
+ "How can the public access the company's financial and legal reports?",
404
+ ]
405
+ embeddings = model.encode(sentences)
406
+ print(embeddings.shape)
407
+ # [3, 768]
408
+
409
+ # Get the similarity scores for the embeddings
410
+ similarities = model.similarity(embeddings, embeddings)
411
+ print(similarities.shape)
412
+ # [3, 3]
413
+ ```
414
+
415
+ <!--
416
+ ### Direct Usage (Transformers)
417
+
418
+ <details><summary>Click to see the direct usage in Transformers</summary>
419
+
420
+ </details>
421
+ -->
422
+
423
+ <!--
424
+ ### Downstream Usage (Sentence Transformers)
425
+
426
+ You can finetune this model on your own dataset.
427
+
428
+ <details><summary>Click to expand</summary>
429
+
430
+ </details>
431
+ -->
432
+
433
+ <!--
434
+ ### Out-of-Scope Use
435
+
436
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
437
+ -->
438
+
439
+ ## Evaluation
440
+
441
+ ### Metrics
442
+
443
+ #### Information Retrieval
444
+ * Dataset: `dim_768`
445
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
446
+
447
+ | Metric | Value |
448
+ |:--------------------|:-----------|
449
+ | cosine_accuracy@1 | 0.6786 |
450
+ | cosine_accuracy@3 | 0.8171 |
451
+ | cosine_accuracy@5 | 0.8671 |
452
+ | cosine_accuracy@10 | 0.91 |
453
+ | cosine_precision@1 | 0.6786 |
454
+ | cosine_precision@3 | 0.2724 |
455
+ | cosine_precision@5 | 0.1734 |
456
+ | cosine_precision@10 | 0.091 |
457
+ | cosine_recall@1 | 0.6786 |
458
+ | cosine_recall@3 | 0.8171 |
459
+ | cosine_recall@5 | 0.8671 |
460
+ | cosine_recall@10 | 0.91 |
461
+ | cosine_ndcg@10 | 0.7949 |
462
+ | cosine_mrr@10 | 0.758 |
463
+ | **cosine_map@100** | **0.7618** |
464
+
465
+ #### Information Retrieval
466
+ * Dataset: `dim_512`
467
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
468
+
469
+ | Metric | Value |
470
+ |:--------------------|:-----------|
471
+ | cosine_accuracy@1 | 0.6714 |
472
+ | cosine_accuracy@3 | 0.8171 |
473
+ | cosine_accuracy@5 | 0.8643 |
474
+ | cosine_accuracy@10 | 0.9029 |
475
+ | cosine_precision@1 | 0.6714 |
476
+ | cosine_precision@3 | 0.2724 |
477
+ | cosine_precision@5 | 0.1729 |
478
+ | cosine_precision@10 | 0.0903 |
479
+ | cosine_recall@1 | 0.6714 |
480
+ | cosine_recall@3 | 0.8171 |
481
+ | cosine_recall@5 | 0.8643 |
482
+ | cosine_recall@10 | 0.9029 |
483
+ | cosine_ndcg@10 | 0.7892 |
484
+ | cosine_mrr@10 | 0.7525 |
485
+ | **cosine_map@100** | **0.7567** |
486
+
487
+ #### Information Retrieval
488
+ * Dataset: `dim_256`
489
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
490
+
491
+ | Metric | Value |
492
+ |:--------------------|:-----------|
493
+ | cosine_accuracy@1 | 0.6671 |
494
+ | cosine_accuracy@3 | 0.8143 |
495
+ | cosine_accuracy@5 | 0.8657 |
496
+ | cosine_accuracy@10 | 0.9029 |
497
+ | cosine_precision@1 | 0.6671 |
498
+ | cosine_precision@3 | 0.2714 |
499
+ | cosine_precision@5 | 0.1731 |
500
+ | cosine_precision@10 | 0.0903 |
501
+ | cosine_recall@1 | 0.6671 |
502
+ | cosine_recall@3 | 0.8143 |
503
+ | cosine_recall@5 | 0.8657 |
504
+ | cosine_recall@10 | 0.9029 |
505
+ | cosine_ndcg@10 | 0.7867 |
506
+ | cosine_mrr@10 | 0.7492 |
507
+ | **cosine_map@100** | **0.7533** |
508
+
509
+ #### Information Retrieval
510
+ * Dataset: `dim_128`
511
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
512
+
513
+ | Metric | Value |
514
+ |:--------------------|:-----------|
515
+ | cosine_accuracy@1 | 0.6543 |
516
+ | cosine_accuracy@3 | 0.8071 |
517
+ | cosine_accuracy@5 | 0.8429 |
518
+ | cosine_accuracy@10 | 0.9 |
519
+ | cosine_precision@1 | 0.6543 |
520
+ | cosine_precision@3 | 0.269 |
521
+ | cosine_precision@5 | 0.1686 |
522
+ | cosine_precision@10 | 0.09 |
523
+ | cosine_recall@1 | 0.6543 |
524
+ | cosine_recall@3 | 0.8071 |
525
+ | cosine_recall@5 | 0.8429 |
526
+ | cosine_recall@10 | 0.9 |
527
+ | cosine_ndcg@10 | 0.7764 |
528
+ | cosine_mrr@10 | 0.7369 |
529
+ | **cosine_map@100** | **0.7407** |
530
+
531
+ #### Information Retrieval
532
+ * Dataset: `dim_64`
533
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
534
+
535
+ | Metric | Value |
536
+ |:--------------------|:----------|
537
+ | cosine_accuracy@1 | 0.62 |
538
+ | cosine_accuracy@3 | 0.7671 |
539
+ | cosine_accuracy@5 | 0.8171 |
540
+ | cosine_accuracy@10 | 0.8786 |
541
+ | cosine_precision@1 | 0.62 |
542
+ | cosine_precision@3 | 0.2557 |
543
+ | cosine_precision@5 | 0.1634 |
544
+ | cosine_precision@10 | 0.0879 |
545
+ | cosine_recall@1 | 0.62 |
546
+ | cosine_recall@3 | 0.7671 |
547
+ | cosine_recall@5 | 0.8171 |
548
+ | cosine_recall@10 | 0.8786 |
549
+ | cosine_ndcg@10 | 0.7483 |
550
+ | cosine_mrr@10 | 0.7068 |
551
+ | **cosine_map@100** | **0.711** |
552
+
553
+ <!--
554
+ ## Bias, Risks and Limitations
555
+
556
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
557
+ -->
558
+
559
+ <!--
560
+ ### Recommendations
561
+
562
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
563
+ -->
564
+
565
+ ## Training Details
566
+
567
+ ### Training Dataset
568
+
569
+ #### Unnamed Dataset
570
+
571
+
572
+ * Size: 6,300 training samples
573
+ * Columns: <code>positive</code> and <code>anchor</code>
574
+ * Approximate statistics based on the first 1000 samples:
575
+ | | positive | anchor |
576
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
577
+ | type | string | string |
578
+ | details | <ul><li>min: 2 tokens</li><li>mean: 44.88 tokens</li><li>max: 272 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 20.58 tokens</li><li>max: 45 tokens</li></ul> |
579
+ * Samples:
580
+ | positive | anchor |
581
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------|
582
+ | <code>Walmart Inc. reported total revenues of $611,289 million for the fiscal year ended January 31, 2023.</code> | <code>What was Walmart Inc.'s total revenue in the fiscal year ended January 31, 2023?</code> |
583
+ | <code>The total equity balance of Visa Inc. as of September 30, 2023 was $38,733 million.</code> | <code>What was the total equity of Visa Inc. as of September 30, 2023?</code> |
584
+ | <code>Nike incorporates new technologies in its product design by using market intelligence and research, which helps its design teams identify opportunities to leverage these technologies in existing categories to respond to consumer preferences.</code> | <code>How does Nike incorporate new technologies in its product design?</code> |
585
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
586
+ ```json
587
+ {
588
+ "loss": "MultipleNegativesRankingLoss",
589
+ "matryoshka_dims": [
590
+ 768,
591
+ 512,
592
+ 256,
593
+ 128,
594
+ 64
595
+ ],
596
+ "matryoshka_weights": [
597
+ 1,
598
+ 1,
599
+ 1,
600
+ 1,
601
+ 1
602
+ ],
603
+ "n_dims_per_step": -1
604
+ }
605
+ ```
606
+
607
+ ### Training Hyperparameters
608
+ #### Non-Default Hyperparameters
609
+
610
+ - `eval_strategy`: epoch
611
+ - `per_device_train_batch_size`: 32
612
+ - `per_device_eval_batch_size`: 16
613
+ - `gradient_accumulation_steps`: 16
614
+ - `learning_rate`: 2e-05
615
+ - `num_train_epochs`: 4
616
+ - `lr_scheduler_type`: cosine
617
+ - `warmup_ratio`: 0.1
618
+ - `bf16`: True
619
+ - `tf32`: True
620
+ - `load_best_model_at_end`: True
621
+ - `optim`: adamw_torch_fused
622
+ - `batch_sampler`: no_duplicates
623
+
624
+ #### All Hyperparameters
625
+ <details><summary>Click to expand</summary>
626
+
627
+ - `overwrite_output_dir`: False
628
+ - `do_predict`: False
629
+ - `eval_strategy`: epoch
630
+ - `prediction_loss_only`: True
631
+ - `per_device_train_batch_size`: 32
632
+ - `per_device_eval_batch_size`: 16
633
+ - `per_gpu_train_batch_size`: None
634
+ - `per_gpu_eval_batch_size`: None
635
+ - `gradient_accumulation_steps`: 16
636
+ - `eval_accumulation_steps`: None
637
+ - `torch_empty_cache_steps`: None
638
+ - `learning_rate`: 2e-05
639
+ - `weight_decay`: 0.0
640
+ - `adam_beta1`: 0.9
641
+ - `adam_beta2`: 0.999
642
+ - `adam_epsilon`: 1e-08
643
+ - `max_grad_norm`: 1.0
644
+ - `num_train_epochs`: 4
645
+ - `max_steps`: -1
646
+ - `lr_scheduler_type`: cosine
647
+ - `lr_scheduler_kwargs`: {}
648
+ - `warmup_ratio`: 0.1
649
+ - `warmup_steps`: 0
650
+ - `log_level`: passive
651
+ - `log_level_replica`: warning
652
+ - `log_on_each_node`: True
653
+ - `logging_nan_inf_filter`: True
654
+ - `save_safetensors`: True
655
+ - `save_on_each_node`: False
656
+ - `save_only_model`: False
657
+ - `restore_callback_states_from_checkpoint`: False
658
+ - `no_cuda`: False
659
+ - `use_cpu`: False
660
+ - `use_mps_device`: False
661
+ - `seed`: 42
662
+ - `data_seed`: None
663
+ - `jit_mode_eval`: False
664
+ - `use_ipex`: False
665
+ - `bf16`: True
666
+ - `fp16`: False
667
+ - `fp16_opt_level`: O1
668
+ - `half_precision_backend`: auto
669
+ - `bf16_full_eval`: False
670
+ - `fp16_full_eval`: False
671
+ - `tf32`: True
672
+ - `local_rank`: 0
673
+ - `ddp_backend`: None
674
+ - `tpu_num_cores`: None
675
+ - `tpu_metrics_debug`: False
676
+ - `debug`: []
677
+ - `dataloader_drop_last`: False
678
+ - `dataloader_num_workers`: 0
679
+ - `dataloader_prefetch_factor`: None
680
+ - `past_index`: -1
681
+ - `disable_tqdm`: False
682
+ - `remove_unused_columns`: True
683
+ - `label_names`: None
684
+ - `load_best_model_at_end`: True
685
+ - `ignore_data_skip`: False
686
+ - `fsdp`: []
687
+ - `fsdp_min_num_params`: 0
688
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
689
+ - `fsdp_transformer_layer_cls_to_wrap`: None
690
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
691
+ - `deepspeed`: None
692
+ - `label_smoothing_factor`: 0.0
693
+ - `optim`: adamw_torch_fused
694
+ - `optim_args`: None
695
+ - `adafactor`: False
696
+ - `group_by_length`: False
697
+ - `length_column_name`: length
698
+ - `ddp_find_unused_parameters`: None
699
+ - `ddp_bucket_cap_mb`: None
700
+ - `ddp_broadcast_buffers`: False
701
+ - `dataloader_pin_memory`: True
702
+ - `dataloader_persistent_workers`: False
703
+ - `skip_memory_metrics`: True
704
+ - `use_legacy_prediction_loop`: False
705
+ - `push_to_hub`: False
706
+ - `resume_from_checkpoint`: None
707
+ - `hub_model_id`: None
708
+ - `hub_strategy`: every_save
709
+ - `hub_private_repo`: False
710
+ - `hub_always_push`: False
711
+ - `gradient_checkpointing`: False
712
+ - `gradient_checkpointing_kwargs`: None
713
+ - `include_inputs_for_metrics`: False
714
+ - `eval_do_concat_batches`: True
715
+ - `fp16_backend`: auto
716
+ - `push_to_hub_model_id`: None
717
+ - `push_to_hub_organization`: None
718
+ - `mp_parameters`:
719
+ - `auto_find_batch_size`: False
720
+ - `full_determinism`: False
721
+ - `torchdynamo`: None
722
+ - `ray_scope`: last
723
+ - `ddp_timeout`: 1800
724
+ - `torch_compile`: False
725
+ - `torch_compile_backend`: None
726
+ - `torch_compile_mode`: None
727
+ - `dispatch_batches`: None
728
+ - `split_batches`: None
729
+ - `include_tokens_per_second`: False
730
+ - `include_num_input_tokens_seen`: False
731
+ - `neftune_noise_alpha`: None
732
+ - `optim_target_modules`: None
733
+ - `batch_eval_metrics`: False
734
+ - `eval_on_start`: False
735
+ - `eval_use_gather_object`: False
736
+ - `batch_sampler`: no_duplicates
737
+ - `multi_dataset_batch_sampler`: proportional
738
+
739
+ </details>
740
+
741
+ ### Training Logs
742
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
743
+ |:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
744
+ | 0.8122 | 10 | 1.5521 | - | - | - | - | - |
745
+ | 0.9746 | 12 | - | 0.7178 | 0.7352 | 0.7404 | 0.6833 | 0.7422 |
746
+ | 1.6244 | 20 | 0.6753 | - | - | - | - | - |
747
+ | 1.9492 | 24 | - | 0.7340 | 0.7452 | 0.7524 | 0.7057 | 0.7561 |
748
+ | 2.4365 | 30 | 0.4611 | - | - | - | - | - |
749
+ | 2.9239 | 36 | - | 0.7392 | 0.7509 | 0.7560 | 0.7103 | 0.7588 |
750
+ | 3.2487 | 40 | 0.3763 | - | - | - | - | - |
751
+ | **3.8985** | **48** | **-** | **0.7407** | **0.7533** | **0.7567** | **0.711** | **0.7618** |
752
+
753
+ * The bold row denotes the saved checkpoint.
754
+
755
+ ### Framework Versions
756
+ - Python: 3.10.12
757
+ - Sentence Transformers: 3.0.1
758
+ - Transformers: 4.44.2
759
+ - PyTorch: 2.4.0+cu121
760
+ - Accelerate: 0.32.1
761
+ - Datasets: 2.21.0
762
+ - Tokenizers: 0.19.1
763
+
764
+ ## Citation
765
+
766
+ ### BibTeX
767
+
768
+ #### Sentence Transformers
769
+ ```bibtex
770
+ @inproceedings{reimers-2019-sentence-bert,
771
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
772
+ author = "Reimers, Nils and Gurevych, Iryna",
773
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
774
+ month = "11",
775
+ year = "2019",
776
+ publisher = "Association for Computational Linguistics",
777
+ url = "https://arxiv.org/abs/1908.10084",
778
+ }
779
+ ```
780
+
781
+ #### MatryoshkaLoss
782
+ ```bibtex
783
+ @misc{kusupati2024matryoshka,
784
+ title={Matryoshka Representation Learning},
785
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
786
+ year={2024},
787
+ eprint={2205.13147},
788
+ archivePrefix={arXiv},
789
+ primaryClass={cs.LG}
790
+ }
791
+ ```
792
+
793
+ #### MultipleNegativesRankingLoss
794
+ ```bibtex
795
+ @misc{henderson2017efficient,
796
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
797
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
798
+ year={2017},
799
+ eprint={1705.00652},
800
+ archivePrefix={arXiv},
801
+ primaryClass={cs.CL}
802
+ }
803
+ ```
804
+
805
+ <!--
806
+ ## Glossary
807
+
808
+ *Clearly define terms in order to be accessible across audiences.*
809
+ -->
810
+
811
+ <!--
812
+ ## Model Card Authors
813
+
814
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
815
+ -->
816
+
817
+ <!--
818
+ ## Model Card Contact
819
+
820
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
821
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.44.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.4.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:647327aefaaa7462ed037db7a3ba3796c99a1f5e55e771ed557d15dfe1f299b2
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff