akashmaggon commited on
Commit
0246a62
1 Parent(s): d12e42f

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,815 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ datasets: []
4
+ language:
5
+ - en
6
+ library_name: sentence-transformers
7
+ license: apache-2.0
8
+ metrics:
9
+ - cosine_accuracy@1
10
+ - cosine_accuracy@3
11
+ - cosine_accuracy@5
12
+ - cosine_accuracy@10
13
+ - cosine_precision@1
14
+ - cosine_precision@3
15
+ - cosine_precision@5
16
+ - cosine_precision@10
17
+ - cosine_recall@1
18
+ - cosine_recall@3
19
+ - cosine_recall@5
20
+ - cosine_recall@10
21
+ - cosine_ndcg@10
22
+ - cosine_mrr@10
23
+ - cosine_map@100
24
+ pipeline_tag: sentence-similarity
25
+ tags:
26
+ - sentence-transformers
27
+ - sentence-similarity
28
+ - feature-extraction
29
+ - generated_from_trainer
30
+ - dataset_size:6300
31
+ - loss:MatryoshkaLoss
32
+ - loss:MultipleNegativesRankingLoss
33
+ widget:
34
+ - source_sentence: The U.S. International Trade Commission (ITC) has become a significant
35
+ forum to litigate intellectual property disputes. An adverse result in an ITC
36
+ action can lead to a prohibition on importing infringing products, which, given
37
+ the importance of the U.S. market, could significantly impact a company including
38
+ preventing the importation of many important products or necessitating workarounds
39
+ that may limit certain features of their products.
40
+ sentences:
41
+ - What was the overall impact of foreign currencies on net sales in 2023?
42
+ - What potential consequences could result from intellectual property disputes in
43
+ the U.S. International Trade Commission for the company?
44
+ - What was the total purchase consideration for the VMware acquisition?
45
+ - source_sentence: Reinsurance contracts are normally classified as treaty or facultative
46
+ contracts. Treaty reinsurance refers to reinsurance coverage for all or a portion
47
+ of a specified group or class of risks ceded by a direct insurer or reinsurer,
48
+ while facultative reinsurance involves coverage of specific individual underlying
49
+ risks. Reinsurance contracts are further classified as quota-share or excess.
50
+ sentences:
51
+ - What type of information will you find under 'Note 13 — Commitments and Contingencies'
52
+ in an Annual Report on Form 10-K?
53
+ - What type of reinsurance contracts are offered by Berkshire Hathaway Reinsurance
54
+ Group?
55
+ - What are the consequences for a company violating anti-bribery laws in the U.S.?
56
+ - source_sentence: Commitments and contingencies related to legal proceedings are
57
+ detailed in Part II, Item 8, under 'Financial Statements and Supplementary Data
58
+ – Note 14'.
59
+ sentences:
60
+ - Where can one find commitments and contingencies related to legal proceedings
61
+ in the context provided?
62
+ - What is discussed in Item 3. Legal Proceedings of a company's report?
63
+ - How are net realized capital gains and losses treated in the financial statements
64
+ according to the Company?
65
+ - source_sentence: The “Glossary of Terms and Acronyms” is included on pages 315-321
66
+ in the set of financial documents.
67
+ sentences:
68
+ - What are the principles used in preparing the discussed financial statements?
69
+ - What is the total remaining budget for future common stock repurchases under the
70
+ company's stock repurchase programs as of December 31, 2023?
71
+ - Where is the “Glossary of Terms and Acronyms” located in a set of financial documents?
72
+ - source_sentence: The table presents our market risk by asset category for positions
73
+ accounted for at fair value or accounted for at the lower of cost or fair value,
74
+ that are not included in VaR. As of December 2023, equity was at $1,562 million
75
+ and debt was at $2,446 million.
76
+ sentences:
77
+ - What are the market risk values for Goldman Sachs' equity and debt positions not
78
+ included in VaR as of December 2023?
79
+ - What was the conclusion of the Company's review regarding the impact of the American
80
+ Rescue Plan, the Consolidated Appropriations Act, 2021, and related tax provisions
81
+ on its business for the fiscal year ended June 30, 2023?
82
+ - How much did the company's finance lease obligations total as of December 31,
83
+ 2023?
84
+ model-index:
85
+ - name: BGE base Financial Matryoshka
86
+ results:
87
+ - task:
88
+ type: information-retrieval
89
+ name: Information Retrieval
90
+ dataset:
91
+ name: dim 768
92
+ type: dim_768
93
+ metrics:
94
+ - type: cosine_accuracy@1
95
+ value: 0.6957142857142857
96
+ name: Cosine Accuracy@1
97
+ - type: cosine_accuracy@3
98
+ value: 0.8371428571428572
99
+ name: Cosine Accuracy@3
100
+ - type: cosine_accuracy@5
101
+ value: 0.8714285714285714
102
+ name: Cosine Accuracy@5
103
+ - type: cosine_accuracy@10
104
+ value: 0.9242857142857143
105
+ name: Cosine Accuracy@10
106
+ - type: cosine_precision@1
107
+ value: 0.6957142857142857
108
+ name: Cosine Precision@1
109
+ - type: cosine_precision@3
110
+ value: 0.27904761904761904
111
+ name: Cosine Precision@3
112
+ - type: cosine_precision@5
113
+ value: 0.17428571428571424
114
+ name: Cosine Precision@5
115
+ - type: cosine_precision@10
116
+ value: 0.09242857142857142
117
+ name: Cosine Precision@10
118
+ - type: cosine_recall@1
119
+ value: 0.6957142857142857
120
+ name: Cosine Recall@1
121
+ - type: cosine_recall@3
122
+ value: 0.8371428571428572
123
+ name: Cosine Recall@3
124
+ - type: cosine_recall@5
125
+ value: 0.8714285714285714
126
+ name: Cosine Recall@5
127
+ - type: cosine_recall@10
128
+ value: 0.9242857142857143
129
+ name: Cosine Recall@10
130
+ - type: cosine_ndcg@10
131
+ value: 0.8105294489003092
132
+ name: Cosine Ndcg@10
133
+ - type: cosine_mrr@10
134
+ value: 0.7741910430839002
135
+ name: Cosine Mrr@10
136
+ - type: cosine_map@100
137
+ value: 0.7773317927980538
138
+ name: Cosine Map@100
139
+ - task:
140
+ type: information-retrieval
141
+ name: Information Retrieval
142
+ dataset:
143
+ name: dim 512
144
+ type: dim_512
145
+ metrics:
146
+ - type: cosine_accuracy@1
147
+ value: 0.7
148
+ name: Cosine Accuracy@1
149
+ - type: cosine_accuracy@3
150
+ value: 0.8285714285714286
151
+ name: Cosine Accuracy@3
152
+ - type: cosine_accuracy@5
153
+ value: 0.8671428571428571
154
+ name: Cosine Accuracy@5
155
+ - type: cosine_accuracy@10
156
+ value: 0.9185714285714286
157
+ name: Cosine Accuracy@10
158
+ - type: cosine_precision@1
159
+ value: 0.7
160
+ name: Cosine Precision@1
161
+ - type: cosine_precision@3
162
+ value: 0.27619047619047615
163
+ name: Cosine Precision@3
164
+ - type: cosine_precision@5
165
+ value: 0.1734285714285714
166
+ name: Cosine Precision@5
167
+ - type: cosine_precision@10
168
+ value: 0.09185714285714283
169
+ name: Cosine Precision@10
170
+ - type: cosine_recall@1
171
+ value: 0.7
172
+ name: Cosine Recall@1
173
+ - type: cosine_recall@3
174
+ value: 0.8285714285714286
175
+ name: Cosine Recall@3
176
+ - type: cosine_recall@5
177
+ value: 0.8671428571428571
178
+ name: Cosine Recall@5
179
+ - type: cosine_recall@10
180
+ value: 0.9185714285714286
181
+ name: Cosine Recall@10
182
+ - type: cosine_ndcg@10
183
+ value: 0.8090367290103152
184
+ name: Cosine Ndcg@10
185
+ - type: cosine_mrr@10
186
+ value: 0.7740351473922898
187
+ name: Cosine Mrr@10
188
+ - type: cosine_map@100
189
+ value: 0.7776494145961331
190
+ name: Cosine Map@100
191
+ - task:
192
+ type: information-retrieval
193
+ name: Information Retrieval
194
+ dataset:
195
+ name: dim 256
196
+ type: dim_256
197
+ metrics:
198
+ - type: cosine_accuracy@1
199
+ value: 0.6928571428571428
200
+ name: Cosine Accuracy@1
201
+ - type: cosine_accuracy@3
202
+ value: 0.8185714285714286
203
+ name: Cosine Accuracy@3
204
+ - type: cosine_accuracy@5
205
+ value: 0.8585714285714285
206
+ name: Cosine Accuracy@5
207
+ - type: cosine_accuracy@10
208
+ value: 0.91
209
+ name: Cosine Accuracy@10
210
+ - type: cosine_precision@1
211
+ value: 0.6928571428571428
212
+ name: Cosine Precision@1
213
+ - type: cosine_precision@3
214
+ value: 0.27285714285714285
215
+ name: Cosine Precision@3
216
+ - type: cosine_precision@5
217
+ value: 0.17171428571428568
218
+ name: Cosine Precision@5
219
+ - type: cosine_precision@10
220
+ value: 0.09099999999999998
221
+ name: Cosine Precision@10
222
+ - type: cosine_recall@1
223
+ value: 0.6928571428571428
224
+ name: Cosine Recall@1
225
+ - type: cosine_recall@3
226
+ value: 0.8185714285714286
227
+ name: Cosine Recall@3
228
+ - type: cosine_recall@5
229
+ value: 0.8585714285714285
230
+ name: Cosine Recall@5
231
+ - type: cosine_recall@10
232
+ value: 0.91
233
+ name: Cosine Recall@10
234
+ - type: cosine_ndcg@10
235
+ value: 0.8016663265681359
236
+ name: Cosine Ndcg@10
237
+ - type: cosine_mrr@10
238
+ value: 0.7669977324263035
239
+ name: Cosine Mrr@10
240
+ - type: cosine_map@100
241
+ value: 0.7711841838569463
242
+ name: Cosine Map@100
243
+ - task:
244
+ type: information-retrieval
245
+ name: Information Retrieval
246
+ dataset:
247
+ name: dim 128
248
+ type: dim_128
249
+ metrics:
250
+ - type: cosine_accuracy@1
251
+ value: 0.6871428571428572
252
+ name: Cosine Accuracy@1
253
+ - type: cosine_accuracy@3
254
+ value: 0.8071428571428572
255
+ name: Cosine Accuracy@3
256
+ - type: cosine_accuracy@5
257
+ value: 0.8585714285714285
258
+ name: Cosine Accuracy@5
259
+ - type: cosine_accuracy@10
260
+ value: 0.8985714285714286
261
+ name: Cosine Accuracy@10
262
+ - type: cosine_precision@1
263
+ value: 0.6871428571428572
264
+ name: Cosine Precision@1
265
+ - type: cosine_precision@3
266
+ value: 0.26904761904761904
267
+ name: Cosine Precision@3
268
+ - type: cosine_precision@5
269
+ value: 0.1717142857142857
270
+ name: Cosine Precision@5
271
+ - type: cosine_precision@10
272
+ value: 0.08985714285714283
273
+ name: Cosine Precision@10
274
+ - type: cosine_recall@1
275
+ value: 0.6871428571428572
276
+ name: Cosine Recall@1
277
+ - type: cosine_recall@3
278
+ value: 0.8071428571428572
279
+ name: Cosine Recall@3
280
+ - type: cosine_recall@5
281
+ value: 0.8585714285714285
282
+ name: Cosine Recall@5
283
+ - type: cosine_recall@10
284
+ value: 0.8985714285714286
285
+ name: Cosine Recall@10
286
+ - type: cosine_ndcg@10
287
+ value: 0.7921056491431833
288
+ name: Cosine Ndcg@10
289
+ - type: cosine_mrr@10
290
+ value: 0.7580946712018135
291
+ name: Cosine Mrr@10
292
+ - type: cosine_map@100
293
+ value: 0.7627063166788922
294
+ name: Cosine Map@100
295
+ - task:
296
+ type: information-retrieval
297
+ name: Information Retrieval
298
+ dataset:
299
+ name: dim 64
300
+ type: dim_64
301
+ metrics:
302
+ - type: cosine_accuracy@1
303
+ value: 0.6642857142857143
304
+ name: Cosine Accuracy@1
305
+ - type: cosine_accuracy@3
306
+ value: 0.7842857142857143
307
+ name: Cosine Accuracy@3
308
+ - type: cosine_accuracy@5
309
+ value: 0.8257142857142857
310
+ name: Cosine Accuracy@5
311
+ - type: cosine_accuracy@10
312
+ value: 0.8728571428571429
313
+ name: Cosine Accuracy@10
314
+ - type: cosine_precision@1
315
+ value: 0.6642857142857143
316
+ name: Cosine Precision@1
317
+ - type: cosine_precision@3
318
+ value: 0.26142857142857145
319
+ name: Cosine Precision@3
320
+ - type: cosine_precision@5
321
+ value: 0.16514285714285715
322
+ name: Cosine Precision@5
323
+ - type: cosine_precision@10
324
+ value: 0.08728571428571427
325
+ name: Cosine Precision@10
326
+ - type: cosine_recall@1
327
+ value: 0.6642857142857143
328
+ name: Cosine Recall@1
329
+ - type: cosine_recall@3
330
+ value: 0.7842857142857143
331
+ name: Cosine Recall@3
332
+ - type: cosine_recall@5
333
+ value: 0.8257142857142857
334
+ name: Cosine Recall@5
335
+ - type: cosine_recall@10
336
+ value: 0.8728571428571429
337
+ name: Cosine Recall@10
338
+ - type: cosine_ndcg@10
339
+ value: 0.7689727571743198
340
+ name: Cosine Ndcg@10
341
+ - type: cosine_mrr@10
342
+ value: 0.7358214285714282
343
+ name: Cosine Mrr@10
344
+ - type: cosine_map@100
345
+ value: 0.7406658506857838
346
+ name: Cosine Map@100
347
+ ---
348
+
349
+ # BGE base Financial Matryoshka
350
+
351
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
352
+
353
+ ## Model Details
354
+
355
+ ### Model Description
356
+ - **Model Type:** Sentence Transformer
357
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
358
+ - **Maximum Sequence Length:** 512 tokens
359
+ - **Output Dimensionality:** 768 tokens
360
+ - **Similarity Function:** Cosine Similarity
361
+ <!-- - **Training Dataset:** Unknown -->
362
+ - **Language:** en
363
+ - **License:** apache-2.0
364
+
365
+ ### Model Sources
366
+
367
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
368
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
369
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
370
+
371
+ ### Full Model Architecture
372
+
373
+ ```
374
+ SentenceTransformer(
375
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
376
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
377
+ (2): Normalize()
378
+ )
379
+ ```
380
+
381
+ ## Usage
382
+
383
+ ### Direct Usage (Sentence Transformers)
384
+
385
+ First install the Sentence Transformers library:
386
+
387
+ ```bash
388
+ pip install -U sentence-transformers
389
+ ```
390
+
391
+ Then you can load this model and run inference.
392
+ ```python
393
+ from sentence_transformers import SentenceTransformer
394
+
395
+ # Download from the 🤗 Hub
396
+ model = SentenceTransformer("akashmaggon/bge-base-financial-matryoshka")
397
+ # Run inference
398
+ sentences = [
399
+ 'The table presents our market risk by asset category for positions accounted for at fair value or accounted for at the lower of cost or fair value, that are not included in VaR. As of December 2023, equity was at $1,562 million and debt was at $2,446 million.',
400
+ "What are the market risk values for Goldman Sachs' equity and debt positions not included in VaR as of December 2023?",
401
+ "What was the conclusion of the Company's review regarding the impact of the American Rescue Plan, the Consolidated Appropriations Act, 2021, and related tax provisions on its business for the fiscal year ended June 30, 2023?",
402
+ ]
403
+ embeddings = model.encode(sentences)
404
+ print(embeddings.shape)
405
+ # [3, 768]
406
+
407
+ # Get the similarity scores for the embeddings
408
+ similarities = model.similarity(embeddings, embeddings)
409
+ print(similarities.shape)
410
+ # [3, 3]
411
+ ```
412
+
413
+ <!--
414
+ ### Direct Usage (Transformers)
415
+
416
+ <details><summary>Click to see the direct usage in Transformers</summary>
417
+
418
+ </details>
419
+ -->
420
+
421
+ <!--
422
+ ### Downstream Usage (Sentence Transformers)
423
+
424
+ You can finetune this model on your own dataset.
425
+
426
+ <details><summary>Click to expand</summary>
427
+
428
+ </details>
429
+ -->
430
+
431
+ <!--
432
+ ### Out-of-Scope Use
433
+
434
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
435
+ -->
436
+
437
+ ## Evaluation
438
+
439
+ ### Metrics
440
+
441
+ #### Information Retrieval
442
+ * Dataset: `dim_768`
443
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
444
+
445
+ | Metric | Value |
446
+ |:--------------------|:-----------|
447
+ | cosine_accuracy@1 | 0.6957 |
448
+ | cosine_accuracy@3 | 0.8371 |
449
+ | cosine_accuracy@5 | 0.8714 |
450
+ | cosine_accuracy@10 | 0.9243 |
451
+ | cosine_precision@1 | 0.6957 |
452
+ | cosine_precision@3 | 0.279 |
453
+ | cosine_precision@5 | 0.1743 |
454
+ | cosine_precision@10 | 0.0924 |
455
+ | cosine_recall@1 | 0.6957 |
456
+ | cosine_recall@3 | 0.8371 |
457
+ | cosine_recall@5 | 0.8714 |
458
+ | cosine_recall@10 | 0.9243 |
459
+ | cosine_ndcg@10 | 0.8105 |
460
+ | cosine_mrr@10 | 0.7742 |
461
+ | **cosine_map@100** | **0.7773** |
462
+
463
+ #### Information Retrieval
464
+ * Dataset: `dim_512`
465
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
466
+
467
+ | Metric | Value |
468
+ |:--------------------|:-----------|
469
+ | cosine_accuracy@1 | 0.7 |
470
+ | cosine_accuracy@3 | 0.8286 |
471
+ | cosine_accuracy@5 | 0.8671 |
472
+ | cosine_accuracy@10 | 0.9186 |
473
+ | cosine_precision@1 | 0.7 |
474
+ | cosine_precision@3 | 0.2762 |
475
+ | cosine_precision@5 | 0.1734 |
476
+ | cosine_precision@10 | 0.0919 |
477
+ | cosine_recall@1 | 0.7 |
478
+ | cosine_recall@3 | 0.8286 |
479
+ | cosine_recall@5 | 0.8671 |
480
+ | cosine_recall@10 | 0.9186 |
481
+ | cosine_ndcg@10 | 0.809 |
482
+ | cosine_mrr@10 | 0.774 |
483
+ | **cosine_map@100** | **0.7776** |
484
+
485
+ #### Information Retrieval
486
+ * Dataset: `dim_256`
487
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
488
+
489
+ | Metric | Value |
490
+ |:--------------------|:-----------|
491
+ | cosine_accuracy@1 | 0.6929 |
492
+ | cosine_accuracy@3 | 0.8186 |
493
+ | cosine_accuracy@5 | 0.8586 |
494
+ | cosine_accuracy@10 | 0.91 |
495
+ | cosine_precision@1 | 0.6929 |
496
+ | cosine_precision@3 | 0.2729 |
497
+ | cosine_precision@5 | 0.1717 |
498
+ | cosine_precision@10 | 0.091 |
499
+ | cosine_recall@1 | 0.6929 |
500
+ | cosine_recall@3 | 0.8186 |
501
+ | cosine_recall@5 | 0.8586 |
502
+ | cosine_recall@10 | 0.91 |
503
+ | cosine_ndcg@10 | 0.8017 |
504
+ | cosine_mrr@10 | 0.767 |
505
+ | **cosine_map@100** | **0.7712** |
506
+
507
+ #### Information Retrieval
508
+ * Dataset: `dim_128`
509
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
510
+
511
+ | Metric | Value |
512
+ |:--------------------|:-----------|
513
+ | cosine_accuracy@1 | 0.6871 |
514
+ | cosine_accuracy@3 | 0.8071 |
515
+ | cosine_accuracy@5 | 0.8586 |
516
+ | cosine_accuracy@10 | 0.8986 |
517
+ | cosine_precision@1 | 0.6871 |
518
+ | cosine_precision@3 | 0.269 |
519
+ | cosine_precision@5 | 0.1717 |
520
+ | cosine_precision@10 | 0.0899 |
521
+ | cosine_recall@1 | 0.6871 |
522
+ | cosine_recall@3 | 0.8071 |
523
+ | cosine_recall@5 | 0.8586 |
524
+ | cosine_recall@10 | 0.8986 |
525
+ | cosine_ndcg@10 | 0.7921 |
526
+ | cosine_mrr@10 | 0.7581 |
527
+ | **cosine_map@100** | **0.7627** |
528
+
529
+ #### Information Retrieval
530
+ * Dataset: `dim_64`
531
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
532
+
533
+ | Metric | Value |
534
+ |:--------------------|:-----------|
535
+ | cosine_accuracy@1 | 0.6643 |
536
+ | cosine_accuracy@3 | 0.7843 |
537
+ | cosine_accuracy@5 | 0.8257 |
538
+ | cosine_accuracy@10 | 0.8729 |
539
+ | cosine_precision@1 | 0.6643 |
540
+ | cosine_precision@3 | 0.2614 |
541
+ | cosine_precision@5 | 0.1651 |
542
+ | cosine_precision@10 | 0.0873 |
543
+ | cosine_recall@1 | 0.6643 |
544
+ | cosine_recall@3 | 0.7843 |
545
+ | cosine_recall@5 | 0.8257 |
546
+ | cosine_recall@10 | 0.8729 |
547
+ | cosine_ndcg@10 | 0.769 |
548
+ | cosine_mrr@10 | 0.7358 |
549
+ | **cosine_map@100** | **0.7407** |
550
+
551
+ <!--
552
+ ## Bias, Risks and Limitations
553
+
554
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
555
+ -->
556
+
557
+ <!--
558
+ ### Recommendations
559
+
560
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
561
+ -->
562
+
563
+ ## Training Details
564
+
565
+ ### Training Dataset
566
+
567
+ #### Unnamed Dataset
568
+
569
+
570
+ * Size: 6,300 training samples
571
+ * Columns: <code>positive</code> and <code>anchor</code>
572
+ * Approximate statistics based on the first 1000 samples:
573
+ | | positive | anchor |
574
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
575
+ | type | string | string |
576
+ | details | <ul><li>min: 7 tokens</li><li>mean: 44.39 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 20.64 tokens</li><li>max: 51 tokens</li></ul> |
577
+ * Samples:
578
+ | positive | anchor |
579
+ |:---------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------|
580
+ | <code>Johnson & Johnson reported cash and cash equivalents of $21,859 million as of the end of 2023.</code> | <code>What was the amount of cash and cash equivalents reported by Johnson & Johnson at the end of 2023?</code> |
581
+ | <code>Johnson & Johnson's consolidated statements of earnings for 2023 reported total net earnings of $35,153 million.</code> | <code>What was the total net earnings for Johnson & Johnson in 2023?</code> |
582
+ | <code>As of December 31, 2023, short-term investments were valued at $236,118 thousand and long-term investments at $86,676 thousand.</code> | <code>What is the total value of short-term and long-term investments held by the company as of December 31, 2023?</code> |
583
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
584
+ ```json
585
+ {
586
+ "loss": "MultipleNegativesRankingLoss",
587
+ "matryoshka_dims": [
588
+ 768,
589
+ 512,
590
+ 256,
591
+ 128,
592
+ 64
593
+ ],
594
+ "matryoshka_weights": [
595
+ 1,
596
+ 1,
597
+ 1,
598
+ 1,
599
+ 1
600
+ ],
601
+ "n_dims_per_step": -1
602
+ }
603
+ ```
604
+
605
+ ### Training Hyperparameters
606
+ #### Non-Default Hyperparameters
607
+
608
+ - `eval_strategy`: epoch
609
+ - `per_device_train_batch_size`: 32
610
+ - `per_device_eval_batch_size`: 16
611
+ - `gradient_accumulation_steps`: 16
612
+ - `learning_rate`: 2e-05
613
+ - `num_train_epochs`: 4
614
+ - `lr_scheduler_type`: cosine
615
+ - `warmup_ratio`: 0.1
616
+ - `bf16`: True
617
+ - `load_best_model_at_end`: True
618
+ - `optim`: adamw_torch_fused
619
+ - `batch_sampler`: no_duplicates
620
+
621
+ #### All Hyperparameters
622
+ <details><summary>Click to expand</summary>
623
+
624
+ - `overwrite_output_dir`: False
625
+ - `do_predict`: False
626
+ - `eval_strategy`: epoch
627
+ - `prediction_loss_only`: True
628
+ - `per_device_train_batch_size`: 32
629
+ - `per_device_eval_batch_size`: 16
630
+ - `per_gpu_train_batch_size`: None
631
+ - `per_gpu_eval_batch_size`: None
632
+ - `gradient_accumulation_steps`: 16
633
+ - `eval_accumulation_steps`: None
634
+ - `learning_rate`: 2e-05
635
+ - `weight_decay`: 0.0
636
+ - `adam_beta1`: 0.9
637
+ - `adam_beta2`: 0.999
638
+ - `adam_epsilon`: 1e-08
639
+ - `max_grad_norm`: 1.0
640
+ - `num_train_epochs`: 4
641
+ - `max_steps`: -1
642
+ - `lr_scheduler_type`: cosine
643
+ - `lr_scheduler_kwargs`: {}
644
+ - `warmup_ratio`: 0.1
645
+ - `warmup_steps`: 0
646
+ - `log_level`: passive
647
+ - `log_level_replica`: warning
648
+ - `log_on_each_node`: True
649
+ - `logging_nan_inf_filter`: True
650
+ - `save_safetensors`: True
651
+ - `save_on_each_node`: False
652
+ - `save_only_model`: False
653
+ - `restore_callback_states_from_checkpoint`: False
654
+ - `no_cuda`: False
655
+ - `use_cpu`: False
656
+ - `use_mps_device`: False
657
+ - `seed`: 42
658
+ - `data_seed`: None
659
+ - `jit_mode_eval`: False
660
+ - `use_ipex`: False
661
+ - `bf16`: True
662
+ - `fp16`: False
663
+ - `fp16_opt_level`: O1
664
+ - `half_precision_backend`: auto
665
+ - `bf16_full_eval`: False
666
+ - `fp16_full_eval`: False
667
+ - `tf32`: None
668
+ - `local_rank`: 0
669
+ - `ddp_backend`: None
670
+ - `tpu_num_cores`: None
671
+ - `tpu_metrics_debug`: False
672
+ - `debug`: []
673
+ - `dataloader_drop_last`: False
674
+ - `dataloader_num_workers`: 0
675
+ - `dataloader_prefetch_factor`: None
676
+ - `past_index`: -1
677
+ - `disable_tqdm`: False
678
+ - `remove_unused_columns`: True
679
+ - `label_names`: None
680
+ - `load_best_model_at_end`: True
681
+ - `ignore_data_skip`: False
682
+ - `fsdp`: []
683
+ - `fsdp_min_num_params`: 0
684
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
685
+ - `fsdp_transformer_layer_cls_to_wrap`: None
686
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
687
+ - `deepspeed`: None
688
+ - `label_smoothing_factor`: 0.0
689
+ - `optim`: adamw_torch_fused
690
+ - `optim_args`: None
691
+ - `adafactor`: False
692
+ - `group_by_length`: False
693
+ - `length_column_name`: length
694
+ - `ddp_find_unused_parameters`: None
695
+ - `ddp_bucket_cap_mb`: None
696
+ - `ddp_broadcast_buffers`: False
697
+ - `dataloader_pin_memory`: True
698
+ - `dataloader_persistent_workers`: False
699
+ - `skip_memory_metrics`: True
700
+ - `use_legacy_prediction_loop`: False
701
+ - `push_to_hub`: False
702
+ - `resume_from_checkpoint`: None
703
+ - `hub_model_id`: None
704
+ - `hub_strategy`: every_save
705
+ - `hub_private_repo`: False
706
+ - `hub_always_push`: False
707
+ - `gradient_checkpointing`: False
708
+ - `gradient_checkpointing_kwargs`: None
709
+ - `include_inputs_for_metrics`: False
710
+ - `eval_do_concat_batches`: True
711
+ - `fp16_backend`: auto
712
+ - `push_to_hub_model_id`: None
713
+ - `push_to_hub_organization`: None
714
+ - `mp_parameters`:
715
+ - `auto_find_batch_size`: False
716
+ - `full_determinism`: False
717
+ - `torchdynamo`: None
718
+ - `ray_scope`: last
719
+ - `ddp_timeout`: 1800
720
+ - `torch_compile`: False
721
+ - `torch_compile_backend`: None
722
+ - `torch_compile_mode`: None
723
+ - `dispatch_batches`: None
724
+ - `split_batches`: None
725
+ - `include_tokens_per_second`: False
726
+ - `include_num_input_tokens_seen`: False
727
+ - `neftune_noise_alpha`: None
728
+ - `optim_target_modules`: None
729
+ - `batch_eval_metrics`: False
730
+ - `batch_sampler`: no_duplicates
731
+ - `multi_dataset_batch_sampler`: proportional
732
+
733
+ </details>
734
+
735
+ ### Training Logs
736
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
737
+ |:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
738
+ | 0.8122 | 10 | 1.5779 | - | - | - | - | - |
739
+ | 0.9746 | 12 | - | 0.7388 | 0.7509 | 0.7604 | 0.7081 | 0.7579 |
740
+ | 1.6244 | 20 | 0.6572 | - | - | - | - | - |
741
+ | 1.9492 | 24 | - | 0.7612 | 0.7670 | 0.7729 | 0.7269 | 0.7705 |
742
+ | 2.4365 | 30 | 0.4661 | - | - | - | - | - |
743
+ | 2.9239 | 36 | - | 0.7623 | 0.7702 | 0.7771 | 0.7386 | 0.7758 |
744
+ | 3.2487 | 40 | 0.3774 | - | - | - | - | - |
745
+ | **3.8985** | **48** | **-** | **0.7627** | **0.7712** | **0.7776** | **0.7407** | **0.7773** |
746
+
747
+ * The bold row denotes the saved checkpoint.
748
+
749
+ ### Framework Versions
750
+ - Python: 3.10.12
751
+ - Sentence Transformers: 3.0.1
752
+ - Transformers: 4.41.2
753
+ - PyTorch: 2.3.1+cu121
754
+ - Accelerate: 0.32.1
755
+ - Datasets: 2.19.1
756
+ - Tokenizers: 0.19.1
757
+
758
+ ## Citation
759
+
760
+ ### BibTeX
761
+
762
+ #### Sentence Transformers
763
+ ```bibtex
764
+ @inproceedings{reimers-2019-sentence-bert,
765
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
766
+ author = "Reimers, Nils and Gurevych, Iryna",
767
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
768
+ month = "11",
769
+ year = "2019",
770
+ publisher = "Association for Computational Linguistics",
771
+ url = "https://arxiv.org/abs/1908.10084",
772
+ }
773
+ ```
774
+
775
+ #### MatryoshkaLoss
776
+ ```bibtex
777
+ @misc{kusupati2024matryoshka,
778
+ title={Matryoshka Representation Learning},
779
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
780
+ year={2024},
781
+ eprint={2205.13147},
782
+ archivePrefix={arXiv},
783
+ primaryClass={cs.LG}
784
+ }
785
+ ```
786
+
787
+ #### MultipleNegativesRankingLoss
788
+ ```bibtex
789
+ @misc{henderson2017efficient,
790
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
791
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
792
+ year={2017},
793
+ eprint={1705.00652},
794
+ archivePrefix={arXiv},
795
+ primaryClass={cs.CL}
796
+ }
797
+ ```
798
+
799
+ <!--
800
+ ## Glossary
801
+
802
+ *Clearly define terms in order to be accessible across audiences.*
803
+ -->
804
+
805
+ <!--
806
+ ## Model Card Authors
807
+
808
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
809
+ -->
810
+
811
+ <!--
812
+ ## Model Card Contact
813
+
814
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
815
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.41.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.3.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e232bb945ad1629abb00ae10af12914f8527da2d1fd8987588361b178b0968bf
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff