ChristianBernhard commited on
Commit
e5e6e8e
·
verified ·
1 Parent(s): 2966527

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,734 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:6300
11
+ - loss:MatryoshkaLoss
12
+ - loss:MultipleNegativesRankingLoss
13
+ base_model: BAAI/bge-base-en-v1.5
14
+ widget:
15
+ - source_sentence: What was the total amount of current assets reported by The Hershey
16
+ Company for the year 2023?
17
+ sentences:
18
+ - The total AUS for all categories, including alternative investments, equity, fixed
19
+ income, and liquidity products, summed up to $2,812 billion in 2023.
20
+ - The Hershey Company reported a total of current assets amounting to $2,912,103
21
+ for the year 2023.
22
+ - Information on legal proceedings is included in Note 15 to the Consolidated Financial
23
+ Statements.
24
+ - source_sentence: What is listed under Item 8 in the document?
25
+ sentences:
26
+ - Chubb Limited further advanced their goal of greater product, customer, and geographical
27
+ diversification with incremental purchases that led to a controlling majority
28
+ interest in Huatai Insurance Group Co. Ltd, owning about 76.5 percent as of July
29
+ 1, 2023.
30
+ - Item 8 includes Financial Statements and Supplementary Data.
31
+ - Further, state attorneys general may bring civil actions seeking either injunction
32
+ or an unspecified amount in damages in response to violations of the HIPAA privacy
33
+ and security regulations.
34
+ - source_sentence: What were the main factors contributing to the change in net sales
35
+ for fiscal 2022?
36
+ sentences:
37
+ - The decrease in consolidated net sales in fiscal 2022 compared to fiscal 2021
38
+ was primarily attributable to the translation impact of a stronger U.S. dollar,
39
+ a decline in sales from new software releases and video game accessories, partially
40
+ offset by an increase in sales of new gaming hardware and toys and collectibles.
41
+ - We receive payment from the delivery partner subsequent to the transfer of food
42
+ and the payment terms are short-term in nature.
43
+ - Net cash used in investing activities was $30.0 million in the year ended December
44
+ 31, 2022, and increased to $73.3 million in the year ended December 31, 2023.
45
+ - source_sentence: What informs the ESG disclosures mentioned in the text?
46
+ sentences:
47
+ - Common Equity Tier 1 (CET1) Capital refers to the total of common stock and related
48
+ surplus minus treasury stock, retained earnings, AOCI, and qualifying minority
49
+ interests after factoring in the necessary regulatory adjustments and deductions.
50
+ - Constant currency revenue percentage change is calculated by determining the change
51
+ in current period revenues over prior period revenues where current period foreign
52
+ currency revenues are translated using prior year exchange outstanding rates and
53
+ hedging effects are excluded from revenues of both periods.
54
+ - Our ESG disclosures are also informed by relevant topics identified through third-party
55
+ ESG reporting organizations, frameworks and standards, such as the TCFD.
56
+ - source_sentence: How many new aircraft did Delta Air Lines take delivery of in 2023?
57
+ sentences:
58
+ - In 2023, Delta took delivery of 43 aircraft.
59
+ - The listing of our common stock on the NYSE could potentially create a conflict
60
+ between the exchange’s regulatory responsibilities to vigorously oversee the listing
61
+ and trading of securities, on the one hand, and our commercial and economic interest,
62
+ on the other hand.
63
+ - 'The Company''s enterprise DEI Strategy is aligned to the DEI Vision and Mission
64
+ and rests on four core pillars: •Build a workforce of individuals with diverse
65
+ backgrounds, cultures, abilities and perspectives •Foster a culture of inclusion
66
+ where every individual belongs •Transform talent and business processes to achieve
67
+ equitable opportunities for all •Drive innovation and growth with our business
68
+ to serve diverse markets around the world.'
69
+ pipeline_tag: sentence-similarity
70
+ library_name: sentence-transformers
71
+ metrics:
72
+ - cosine_accuracy@1
73
+ - cosine_accuracy@3
74
+ - cosine_accuracy@5
75
+ - cosine_accuracy@10
76
+ - cosine_precision@1
77
+ - cosine_precision@3
78
+ - cosine_precision@5
79
+ - cosine_precision@10
80
+ - cosine_recall@1
81
+ - cosine_recall@3
82
+ - cosine_recall@5
83
+ - cosine_recall@10
84
+ - cosine_ndcg@10
85
+ - cosine_mrr@10
86
+ - cosine_map@100
87
+ model-index:
88
+ - name: BGE base Financial Matryoshka
89
+ results:
90
+ - task:
91
+ type: information-retrieval
92
+ name: Information Retrieval
93
+ dataset:
94
+ name: dim 768
95
+ type: dim_768
96
+ metrics:
97
+ - type: cosine_accuracy@1
98
+ value: 0.7
99
+ name: Cosine Accuracy@1
100
+ - type: cosine_accuracy@3
101
+ value: 0.8328571428571429
102
+ name: Cosine Accuracy@3
103
+ - type: cosine_accuracy@5
104
+ value: 0.8614285714285714
105
+ name: Cosine Accuracy@5
106
+ - type: cosine_accuracy@10
107
+ value: 0.9171428571428571
108
+ name: Cosine Accuracy@10
109
+ - type: cosine_precision@1
110
+ value: 0.7
111
+ name: Cosine Precision@1
112
+ - type: cosine_precision@3
113
+ value: 0.2776190476190476
114
+ name: Cosine Precision@3
115
+ - type: cosine_precision@5
116
+ value: 0.17228571428571426
117
+ name: Cosine Precision@5
118
+ - type: cosine_precision@10
119
+ value: 0.09171428571428569
120
+ name: Cosine Precision@10
121
+ - type: cosine_recall@1
122
+ value: 0.7
123
+ name: Cosine Recall@1
124
+ - type: cosine_recall@3
125
+ value: 0.8328571428571429
126
+ name: Cosine Recall@3
127
+ - type: cosine_recall@5
128
+ value: 0.8614285714285714
129
+ name: Cosine Recall@5
130
+ - type: cosine_recall@10
131
+ value: 0.9171428571428571
132
+ name: Cosine Recall@10
133
+ - type: cosine_ndcg@10
134
+ value: 0.8082439242024833
135
+ name: Cosine Ndcg@10
136
+ - type: cosine_mrr@10
137
+ value: 0.7734971655328796
138
+ name: Cosine Mrr@10
139
+ - type: cosine_map@100
140
+ value: 0.7770743874539329
141
+ name: Cosine Map@100
142
+ - task:
143
+ type: information-retrieval
144
+ name: Information Retrieval
145
+ dataset:
146
+ name: dim 512
147
+ type: dim_512
148
+ metrics:
149
+ - type: cosine_accuracy@1
150
+ value: 0.6914285714285714
151
+ name: Cosine Accuracy@1
152
+ - type: cosine_accuracy@3
153
+ value: 0.8328571428571429
154
+ name: Cosine Accuracy@3
155
+ - type: cosine_accuracy@5
156
+ value: 0.8685714285714285
157
+ name: Cosine Accuracy@5
158
+ - type: cosine_accuracy@10
159
+ value: 0.9185714285714286
160
+ name: Cosine Accuracy@10
161
+ - type: cosine_precision@1
162
+ value: 0.6914285714285714
163
+ name: Cosine Precision@1
164
+ - type: cosine_precision@3
165
+ value: 0.2776190476190476
166
+ name: Cosine Precision@3
167
+ - type: cosine_precision@5
168
+ value: 0.1737142857142857
169
+ name: Cosine Precision@5
170
+ - type: cosine_precision@10
171
+ value: 0.09185714285714283
172
+ name: Cosine Precision@10
173
+ - type: cosine_recall@1
174
+ value: 0.6914285714285714
175
+ name: Cosine Recall@1
176
+ - type: cosine_recall@3
177
+ value: 0.8328571428571429
178
+ name: Cosine Recall@3
179
+ - type: cosine_recall@5
180
+ value: 0.8685714285714285
181
+ name: Cosine Recall@5
182
+ - type: cosine_recall@10
183
+ value: 0.9185714285714286
184
+ name: Cosine Recall@10
185
+ - type: cosine_ndcg@10
186
+ value: 0.8056533729911755
187
+ name: Cosine Ndcg@10
188
+ - type: cosine_mrr@10
189
+ value: 0.7695113378684802
190
+ name: Cosine Mrr@10
191
+ - type: cosine_map@100
192
+ value: 0.7731633620598676
193
+ name: Cosine Map@100
194
+ - task:
195
+ type: information-retrieval
196
+ name: Information Retrieval
197
+ dataset:
198
+ name: dim 256
199
+ type: dim_256
200
+ metrics:
201
+ - type: cosine_accuracy@1
202
+ value: 0.6928571428571428
203
+ name: Cosine Accuracy@1
204
+ - type: cosine_accuracy@3
205
+ value: 0.8328571428571429
206
+ name: Cosine Accuracy@3
207
+ - type: cosine_accuracy@5
208
+ value: 0.87
209
+ name: Cosine Accuracy@5
210
+ - type: cosine_accuracy@10
211
+ value: 0.91
212
+ name: Cosine Accuracy@10
213
+ - type: cosine_precision@1
214
+ value: 0.6928571428571428
215
+ name: Cosine Precision@1
216
+ - type: cosine_precision@3
217
+ value: 0.2776190476190476
218
+ name: Cosine Precision@3
219
+ - type: cosine_precision@5
220
+ value: 0.174
221
+ name: Cosine Precision@5
222
+ - type: cosine_precision@10
223
+ value: 0.09099999999999998
224
+ name: Cosine Precision@10
225
+ - type: cosine_recall@1
226
+ value: 0.6928571428571428
227
+ name: Cosine Recall@1
228
+ - type: cosine_recall@3
229
+ value: 0.8328571428571429
230
+ name: Cosine Recall@3
231
+ - type: cosine_recall@5
232
+ value: 0.87
233
+ name: Cosine Recall@5
234
+ - type: cosine_recall@10
235
+ value: 0.91
236
+ name: Cosine Recall@10
237
+ - type: cosine_ndcg@10
238
+ value: 0.8031697277454632
239
+ name: Cosine Ndcg@10
240
+ - type: cosine_mrr@10
241
+ value: 0.7687063492063488
242
+ name: Cosine Mrr@10
243
+ - type: cosine_map@100
244
+ value: 0.772758974076829
245
+ name: Cosine Map@100
246
+ - task:
247
+ type: information-retrieval
248
+ name: Information Retrieval
249
+ dataset:
250
+ name: dim 128
251
+ type: dim_128
252
+ metrics:
253
+ - type: cosine_accuracy@1
254
+ value: 0.67
255
+ name: Cosine Accuracy@1
256
+ - type: cosine_accuracy@3
257
+ value: 0.8028571428571428
258
+ name: Cosine Accuracy@3
259
+ - type: cosine_accuracy@5
260
+ value: 0.8628571428571429
261
+ name: Cosine Accuracy@5
262
+ - type: cosine_accuracy@10
263
+ value: 0.9057142857142857
264
+ name: Cosine Accuracy@10
265
+ - type: cosine_precision@1
266
+ value: 0.67
267
+ name: Cosine Precision@1
268
+ - type: cosine_precision@3
269
+ value: 0.2676190476190476
270
+ name: Cosine Precision@3
271
+ - type: cosine_precision@5
272
+ value: 0.17257142857142854
273
+ name: Cosine Precision@5
274
+ - type: cosine_precision@10
275
+ value: 0.09057142857142855
276
+ name: Cosine Precision@10
277
+ - type: cosine_recall@1
278
+ value: 0.67
279
+ name: Cosine Recall@1
280
+ - type: cosine_recall@3
281
+ value: 0.8028571428571428
282
+ name: Cosine Recall@3
283
+ - type: cosine_recall@5
284
+ value: 0.8628571428571429
285
+ name: Cosine Recall@5
286
+ - type: cosine_recall@10
287
+ value: 0.9057142857142857
288
+ name: Cosine Recall@10
289
+ - type: cosine_ndcg@10
290
+ value: 0.7882417708737697
291
+ name: Cosine Ndcg@10
292
+ - type: cosine_mrr@10
293
+ value: 0.7505816326530609
294
+ name: Cosine Mrr@10
295
+ - type: cosine_map@100
296
+ value: 0.7545140112362249
297
+ name: Cosine Map@100
298
+ - task:
299
+ type: information-retrieval
300
+ name: Information Retrieval
301
+ dataset:
302
+ name: dim 64
303
+ type: dim_64
304
+ metrics:
305
+ - type: cosine_accuracy@1
306
+ value: 0.6557142857142857
307
+ name: Cosine Accuracy@1
308
+ - type: cosine_accuracy@3
309
+ value: 0.7871428571428571
310
+ name: Cosine Accuracy@3
311
+ - type: cosine_accuracy@5
312
+ value: 0.8171428571428572
313
+ name: Cosine Accuracy@5
314
+ - type: cosine_accuracy@10
315
+ value: 0.8742857142857143
316
+ name: Cosine Accuracy@10
317
+ - type: cosine_precision@1
318
+ value: 0.6557142857142857
319
+ name: Cosine Precision@1
320
+ - type: cosine_precision@3
321
+ value: 0.2623809523809524
322
+ name: Cosine Precision@3
323
+ - type: cosine_precision@5
324
+ value: 0.16342857142857142
325
+ name: Cosine Precision@5
326
+ - type: cosine_precision@10
327
+ value: 0.08742857142857141
328
+ name: Cosine Precision@10
329
+ - type: cosine_recall@1
330
+ value: 0.6557142857142857
331
+ name: Cosine Recall@1
332
+ - type: cosine_recall@3
333
+ value: 0.7871428571428571
334
+ name: Cosine Recall@3
335
+ - type: cosine_recall@5
336
+ value: 0.8171428571428572
337
+ name: Cosine Recall@5
338
+ - type: cosine_recall@10
339
+ value: 0.8742857142857143
340
+ name: Cosine Recall@10
341
+ - type: cosine_ndcg@10
342
+ value: 0.7637005971170125
343
+ name: Cosine Ndcg@10
344
+ - type: cosine_mrr@10
345
+ value: 0.7285300453514736
346
+ name: Cosine Mrr@10
347
+ - type: cosine_map@100
348
+ value: 0.7336775414052045
349
+ name: Cosine Map@100
350
+ ---
351
+
352
+ # BGE base Financial Matryoshka
353
+
354
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
355
+
356
+ ## Model Details
357
+
358
+ ### Model Description
359
+ - **Model Type:** Sentence Transformer
360
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
361
+ - **Maximum Sequence Length:** 512 tokens
362
+ - **Output Dimensionality:** 768 dimensions
363
+ - **Similarity Function:** Cosine Similarity
364
+ - **Training Dataset:**
365
+ - json
366
+ - **Language:** en
367
+ - **License:** apache-2.0
368
+
369
+ ### Model Sources
370
+
371
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
372
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
373
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
374
+
375
+ ### Full Model Architecture
376
+
377
+ ```
378
+ SentenceTransformer(
379
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
380
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
381
+ (2): Normalize()
382
+ )
383
+ ```
384
+
385
+ ## Usage
386
+
387
+ ### Direct Usage (Sentence Transformers)
388
+
389
+ First install the Sentence Transformers library:
390
+
391
+ ```bash
392
+ pip install -U sentence-transformers
393
+ ```
394
+
395
+ Then you can load this model and run inference.
396
+ ```python
397
+ from sentence_transformers import SentenceTransformer
398
+
399
+ # Download from the 🤗 Hub
400
+ model = SentenceTransformer("ChristianBernhard/bge-base-financial-matryoshka")
401
+ # Run inference
402
+ sentences = [
403
+ 'How many new aircraft did Delta Air Lines take delivery of in 2023?',
404
+ 'In 2023, Delta took delivery of 43 aircraft.',
405
+ 'The listing of our common stock on the NYSE could potentially create a conflict between the exchange’s regulatory responsibilities to vigorously oversee the listing and trading of securities, on the one hand, and our commercial and economic interest, on the other hand.',
406
+ ]
407
+ embeddings = model.encode(sentences)
408
+ print(embeddings.shape)
409
+ # [3, 768]
410
+
411
+ # Get the similarity scores for the embeddings
412
+ similarities = model.similarity(embeddings, embeddings)
413
+ print(similarities.shape)
414
+ # [3, 3]
415
+ ```
416
+
417
+ <!--
418
+ ### Direct Usage (Transformers)
419
+
420
+ <details><summary>Click to see the direct usage in Transformers</summary>
421
+
422
+ </details>
423
+ -->
424
+
425
+ <!--
426
+ ### Downstream Usage (Sentence Transformers)
427
+
428
+ You can finetune this model on your own dataset.
429
+
430
+ <details><summary>Click to expand</summary>
431
+
432
+ </details>
433
+ -->
434
+
435
+ <!--
436
+ ### Out-of-Scope Use
437
+
438
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
439
+ -->
440
+
441
+ ## Evaluation
442
+
443
+ ### Metrics
444
+
445
+ #### Information Retrieval
446
+
447
+ * Datasets: `dim_768`, `dim_512`, `dim_256`, `dim_128` and `dim_64`
448
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
449
+
450
+ | Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
451
+ |:--------------------|:-----------|:-----------|:-----------|:-----------|:-----------|
452
+ | cosine_accuracy@1 | 0.7 | 0.6914 | 0.6929 | 0.67 | 0.6557 |
453
+ | cosine_accuracy@3 | 0.8329 | 0.8329 | 0.8329 | 0.8029 | 0.7871 |
454
+ | cosine_accuracy@5 | 0.8614 | 0.8686 | 0.87 | 0.8629 | 0.8171 |
455
+ | cosine_accuracy@10 | 0.9171 | 0.9186 | 0.91 | 0.9057 | 0.8743 |
456
+ | cosine_precision@1 | 0.7 | 0.6914 | 0.6929 | 0.67 | 0.6557 |
457
+ | cosine_precision@3 | 0.2776 | 0.2776 | 0.2776 | 0.2676 | 0.2624 |
458
+ | cosine_precision@5 | 0.1723 | 0.1737 | 0.174 | 0.1726 | 0.1634 |
459
+ | cosine_precision@10 | 0.0917 | 0.0919 | 0.091 | 0.0906 | 0.0874 |
460
+ | cosine_recall@1 | 0.7 | 0.6914 | 0.6929 | 0.67 | 0.6557 |
461
+ | cosine_recall@3 | 0.8329 | 0.8329 | 0.8329 | 0.8029 | 0.7871 |
462
+ | cosine_recall@5 | 0.8614 | 0.8686 | 0.87 | 0.8629 | 0.8171 |
463
+ | cosine_recall@10 | 0.9171 | 0.9186 | 0.91 | 0.9057 | 0.8743 |
464
+ | **cosine_ndcg@10** | **0.8082** | **0.8057** | **0.8032** | **0.7882** | **0.7637** |
465
+ | cosine_mrr@10 | 0.7735 | 0.7695 | 0.7687 | 0.7506 | 0.7285 |
466
+ | cosine_map@100 | 0.7771 | 0.7732 | 0.7728 | 0.7545 | 0.7337 |
467
+
468
+ <!--
469
+ ## Bias, Risks and Limitations
470
+
471
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
472
+ -->
473
+
474
+ <!--
475
+ ### Recommendations
476
+
477
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
478
+ -->
479
+
480
+ ## Training Details
481
+
482
+ ### Training Dataset
483
+
484
+ #### json
485
+
486
+ * Dataset: json
487
+ * Size: 6,300 training samples
488
+ * Columns: <code>anchor</code> and <code>positive</code>
489
+ * Approximate statistics based on the first 1000 samples:
490
+ | | anchor | positive |
491
+ |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
492
+ | type | string | string |
493
+ | details | <ul><li>min: 9 tokens</li><li>mean: 20.82 tokens</li><li>max: 41 tokens</li></ul> | <ul><li>min: 9 tokens</li><li>mean: 47.65 tokens</li><li>max: 371 tokens</li></ul> |
494
+ * Samples:
495
+ | anchor | positive |
496
+ |:-------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
497
+ | <code>What challenges did the company face in its supply chain during fiscal 2021?</code> | <code>During fiscal 2021, we experienced significant disruptions in our supply chain which impacted our ability to ship products from overseas on a timely basis.</code> |
498
+ | <code>Is the information on Legal proceedings in the report straightforward or referenced to another section?</code> | <code>The information on Legal proceedings called for by Item 3 is incorporated by reference to Note 19 of the Notes to Consolidated Financial Statements in Item 8 of the report.</code> |
499
+ | <code>What factors particularly influence sales comparisons and comparable sales growth according to the annual report?</code> | <code>Sales comparisons can also be particularly influenced by certain factors that are beyond our control: fluctuations in currency exchange rates (with respect to our international operations); inflation or deflation and changes in the cost of gasoline and associated competitive conditions.</code> |
500
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
501
+ ```json
502
+ {
503
+ "loss": "MultipleNegativesRankingLoss",
504
+ "matryoshka_dims": [
505
+ 768,
506
+ 512,
507
+ 256,
508
+ 128,
509
+ 64
510
+ ],
511
+ "matryoshka_weights": [
512
+ 1,
513
+ 1,
514
+ 1,
515
+ 1,
516
+ 1
517
+ ],
518
+ "n_dims_per_step": -1
519
+ }
520
+ ```
521
+
522
+ ### Training Hyperparameters
523
+ #### Non-Default Hyperparameters
524
+
525
+ - `eval_strategy`: epoch
526
+ - `per_device_train_batch_size`: 32
527
+ - `per_device_eval_batch_size`: 16
528
+ - `gradient_accumulation_steps`: 16
529
+ - `learning_rate`: 2e-05
530
+ - `num_train_epochs`: 4
531
+ - `lr_scheduler_type`: cosine
532
+ - `warmup_ratio`: 0.1
533
+ - `bf16`: True
534
+ - `tf32`: True
535
+ - `load_best_model_at_end`: True
536
+ - `optim`: adamw_torch_fused
537
+ - `batch_sampler`: no_duplicates
538
+
539
+ #### All Hyperparameters
540
+ <details><summary>Click to expand</summary>
541
+
542
+ - `overwrite_output_dir`: False
543
+ - `do_predict`: False
544
+ - `eval_strategy`: epoch
545
+ - `prediction_loss_only`: True
546
+ - `per_device_train_batch_size`: 32
547
+ - `per_device_eval_batch_size`: 16
548
+ - `per_gpu_train_batch_size`: None
549
+ - `per_gpu_eval_batch_size`: None
550
+ - `gradient_accumulation_steps`: 16
551
+ - `eval_accumulation_steps`: None
552
+ - `learning_rate`: 2e-05
553
+ - `weight_decay`: 0.0
554
+ - `adam_beta1`: 0.9
555
+ - `adam_beta2`: 0.999
556
+ - `adam_epsilon`: 1e-08
557
+ - `max_grad_norm`: 1.0
558
+ - `num_train_epochs`: 4
559
+ - `max_steps`: -1
560
+ - `lr_scheduler_type`: cosine
561
+ - `lr_scheduler_kwargs`: {}
562
+ - `warmup_ratio`: 0.1
563
+ - `warmup_steps`: 0
564
+ - `log_level`: passive
565
+ - `log_level_replica`: warning
566
+ - `log_on_each_node`: True
567
+ - `logging_nan_inf_filter`: True
568
+ - `save_safetensors`: True
569
+ - `save_on_each_node`: False
570
+ - `save_only_model`: False
571
+ - `restore_callback_states_from_checkpoint`: False
572
+ - `no_cuda`: False
573
+ - `use_cpu`: False
574
+ - `use_mps_device`: False
575
+ - `seed`: 42
576
+ - `data_seed`: None
577
+ - `jit_mode_eval`: False
578
+ - `use_ipex`: False
579
+ - `bf16`: True
580
+ - `fp16`: False
581
+ - `fp16_opt_level`: O1
582
+ - `half_precision_backend`: auto
583
+ - `bf16_full_eval`: False
584
+ - `fp16_full_eval`: False
585
+ - `tf32`: True
586
+ - `local_rank`: 0
587
+ - `ddp_backend`: None
588
+ - `tpu_num_cores`: None
589
+ - `tpu_metrics_debug`: False
590
+ - `debug`: []
591
+ - `dataloader_drop_last`: False
592
+ - `dataloader_num_workers`: 0
593
+ - `dataloader_prefetch_factor`: None
594
+ - `past_index`: -1
595
+ - `disable_tqdm`: False
596
+ - `remove_unused_columns`: True
597
+ - `label_names`: None
598
+ - `load_best_model_at_end`: True
599
+ - `ignore_data_skip`: False
600
+ - `fsdp`: []
601
+ - `fsdp_min_num_params`: 0
602
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
603
+ - `fsdp_transformer_layer_cls_to_wrap`: None
604
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
605
+ - `deepspeed`: None
606
+ - `label_smoothing_factor`: 0.0
607
+ - `optim`: adamw_torch_fused
608
+ - `optim_args`: None
609
+ - `adafactor`: False
610
+ - `group_by_length`: False
611
+ - `length_column_name`: length
612
+ - `ddp_find_unused_parameters`: None
613
+ - `ddp_bucket_cap_mb`: None
614
+ - `ddp_broadcast_buffers`: False
615
+ - `dataloader_pin_memory`: True
616
+ - `dataloader_persistent_workers`: False
617
+ - `skip_memory_metrics`: True
618
+ - `use_legacy_prediction_loop`: False
619
+ - `push_to_hub`: False
620
+ - `resume_from_checkpoint`: None
621
+ - `hub_model_id`: None
622
+ - `hub_strategy`: every_save
623
+ - `hub_private_repo`: False
624
+ - `hub_always_push`: False
625
+ - `gradient_checkpointing`: False
626
+ - `gradient_checkpointing_kwargs`: None
627
+ - `include_inputs_for_metrics`: False
628
+ - `eval_do_concat_batches`: True
629
+ - `fp16_backend`: auto
630
+ - `push_to_hub_model_id`: None
631
+ - `push_to_hub_organization`: None
632
+ - `mp_parameters`:
633
+ - `auto_find_batch_size`: False
634
+ - `full_determinism`: False
635
+ - `torchdynamo`: None
636
+ - `ray_scope`: last
637
+ - `ddp_timeout`: 1800
638
+ - `torch_compile`: False
639
+ - `torch_compile_backend`: None
640
+ - `torch_compile_mode`: None
641
+ - `dispatch_batches`: None
642
+ - `split_batches`: None
643
+ - `include_tokens_per_second`: False
644
+ - `include_num_input_tokens_seen`: False
645
+ - `neftune_noise_alpha`: None
646
+ - `optim_target_modules`: None
647
+ - `batch_eval_metrics`: False
648
+ - `prompts`: None
649
+ - `batch_sampler`: no_duplicates
650
+ - `multi_dataset_batch_sampler`: proportional
651
+
652
+ </details>
653
+
654
+ ### Training Logs
655
+ | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
656
+ |:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
657
+ | 0.8122 | 10 | 1.5819 | - | - | - | - | - |
658
+ | 0.9746 | 12 | - | 0.7909 | 0.7912 | 0.7907 | 0.7723 | 0.7444 |
659
+ | 1.6244 | 20 | 0.6676 | - | - | - | - | - |
660
+ | 1.9492 | 24 | - | 0.7991 | 0.7994 | 0.7983 | 0.7849 | 0.7571 |
661
+ | 2.4365 | 30 | 0.4321 | - | - | - | - | - |
662
+ | 2.9239 | 36 | - | 0.8089 | 0.8048 | 0.8016 | 0.7879 | 0.7637 |
663
+ | 3.2487 | 40 | 0.3958 | - | - | - | - | - |
664
+ | **3.8985** | **48** | **-** | **0.8082** | **0.8057** | **0.8032** | **0.7882** | **0.7637** |
665
+
666
+ * The bold row denotes the saved checkpoint.
667
+
668
+ ### Framework Versions
669
+ - Python: 3.10.12
670
+ - Sentence Transformers: 3.3.1
671
+ - Transformers: 4.41.2
672
+ - PyTorch: 2.1.2+cu121
673
+ - Accelerate: 1.2.0
674
+ - Datasets: 2.19.1
675
+ - Tokenizers: 0.19.1
676
+
677
+ ## Citation
678
+
679
+ ### BibTeX
680
+
681
+ #### Sentence Transformers
682
+ ```bibtex
683
+ @inproceedings{reimers-2019-sentence-bert,
684
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
685
+ author = "Reimers, Nils and Gurevych, Iryna",
686
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
687
+ month = "11",
688
+ year = "2019",
689
+ publisher = "Association for Computational Linguistics",
690
+ url = "https://arxiv.org/abs/1908.10084",
691
+ }
692
+ ```
693
+
694
+ #### MatryoshkaLoss
695
+ ```bibtex
696
+ @misc{kusupati2024matryoshka,
697
+ title={Matryoshka Representation Learning},
698
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
699
+ year={2024},
700
+ eprint={2205.13147},
701
+ archivePrefix={arXiv},
702
+ primaryClass={cs.LG}
703
+ }
704
+ ```
705
+
706
+ #### MultipleNegativesRankingLoss
707
+ ```bibtex
708
+ @misc{henderson2017efficient,
709
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
710
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
711
+ year={2017},
712
+ eprint={1705.00652},
713
+ archivePrefix={arXiv},
714
+ primaryClass={cs.CL}
715
+ }
716
+ ```
717
+
718
+ <!--
719
+ ## Glossary
720
+
721
+ *Clearly define terms in order to be accessible across audiences.*
722
+ -->
723
+
724
+ <!--
725
+ ## Model Card Authors
726
+
727
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
728
+ -->
729
+
730
+ <!--
731
+ ## Model Card Contact
732
+
733
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
734
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.41.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.41.2",
5
+ "pytorch": "2.1.2+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4385f6a4c6a158177f4bfe822af95bc52e52f6ce3b913f170bdd553715cbfffe
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff