CarlosElArtista commited on
Commit
33aeadb
·
verified ·
1 Parent(s): 2d95aa0

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,885 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:6300
11
+ - loss:MatryoshkaLoss
12
+ - loss:MultipleNegativesRankingLoss
13
+ base_model: BAAI/bge-base-en-v1.5
14
+ widget:
15
+ - source_sentence: Chevron provides long-standing employee support programs such as
16
+ Ombuds, an independent resource, a company hotline for reporting concerns, and
17
+ the Employee Assistance Program, a confidential consulting service for a range
18
+ of personal, family, and work-related concerns.
19
+ sentences:
20
+ - What is the effective date for the new accounting standard on equity securities
21
+ for public entities?
22
+ - What programs does Chevron have to support employee well-being and address workplace
23
+ issues?
24
+ - What type of service is provided by Walmart in Mexico to enhance digital connectivity?
25
+ - source_sentence: ProConnect Tax Online is our cloud-based solution, which is designed
26
+ for full-service, year-round practices who prepare all forms of consumer and small
27
+ business returns and integrates with our QuickBooks Online offerings.
28
+ sentences:
29
+ - What is the significance of the Company’s trademarks to their businesses?
30
+ - What are the features of Intuit's ProConnect Tax Online service?
31
+ - Where can information regarding legal proceedings be found in the document?
32
+ - source_sentence: The section titled 'Financial Wtatement and Supplementary Data'
33
+ is labeled with the number 39 in the document.
34
+ sentences:
35
+ - What is the numerical label associated with the section on Financial Statements
36
+ and Supplementary Data in the document?
37
+ - Why did the effective tax rate increase in 2022 compared to 2021?
38
+ - What role does intellectual property play in Nike's competitive position?
39
+ - source_sentence: Our operating cash inflows include cash from vehicle sales and
40
+ related servicing, customer lease and financing payments, customer deposits, cash
41
+ from sales of regulatory credits and energy generation and storage products, and
42
+ interest income on our cash and investments portfolio.
43
+ sentences:
44
+ - What was the net increase in cash and cash equivalents for the year ending December
45
+ 30, 2023?
46
+ - What are the requirements for health insurers and group health plans in providing
47
+ cost estimates to consumers?
48
+ - What are the sources of operating cash inflows?
49
+ - source_sentence: Symtuza (darunavir/C/FTC/TAF), a fixed dose combination product
50
+ that includes cobicistat ('C'), emtricitabine ('FTC'), and tenofovir alafenamide
51
+ ('TAF'), is commercialized by Janssen Sciences Ireland Unlimited Company.
52
+ sentences:
53
+ - What are the primary drugs included in Symtuza and which company commercializes
54
+ it?
55
+ - What was reported as the percentage revenue increase for the Asia Pacific & Latin
56
+ America segment of NIKE from fiscal 2022 to fiscal 2023?
57
+ - What are the main factors influencing competition for the company's products?
58
+ pipeline_tag: sentence-similarity
59
+ library_name: sentence-transformers
60
+ metrics:
61
+ - cosine_accuracy@1
62
+ - cosine_accuracy@3
63
+ - cosine_accuracy@5
64
+ - cosine_accuracy@10
65
+ - cosine_precision@1
66
+ - cosine_precision@3
67
+ - cosine_precision@5
68
+ - cosine_precision@10
69
+ - cosine_recall@1
70
+ - cosine_recall@3
71
+ - cosine_recall@5
72
+ - cosine_recall@10
73
+ - cosine_ndcg@10
74
+ - cosine_mrr@10
75
+ - cosine_map@100
76
+ model-index:
77
+ - name: BGE base Financial Matryoshka
78
+ results:
79
+ - task:
80
+ type: information-retrieval
81
+ name: Information Retrieval
82
+ dataset:
83
+ name: dim 768
84
+ type: dim_768
85
+ metrics:
86
+ - type: cosine_accuracy@1
87
+ value: 0.67
88
+ name: Cosine Accuracy@1
89
+ - type: cosine_accuracy@3
90
+ value: 0.8071428571428572
91
+ name: Cosine Accuracy@3
92
+ - type: cosine_accuracy@5
93
+ value: 0.8485714285714285
94
+ name: Cosine Accuracy@5
95
+ - type: cosine_accuracy@10
96
+ value: 0.8985714285714286
97
+ name: Cosine Accuracy@10
98
+ - type: cosine_precision@1
99
+ value: 0.67
100
+ name: Cosine Precision@1
101
+ - type: cosine_precision@3
102
+ value: 0.26904761904761904
103
+ name: Cosine Precision@3
104
+ - type: cosine_precision@5
105
+ value: 0.16971428571428568
106
+ name: Cosine Precision@5
107
+ - type: cosine_precision@10
108
+ value: 0.08985714285714284
109
+ name: Cosine Precision@10
110
+ - type: cosine_recall@1
111
+ value: 0.67
112
+ name: Cosine Recall@1
113
+ - type: cosine_recall@3
114
+ value: 0.8071428571428572
115
+ name: Cosine Recall@3
116
+ - type: cosine_recall@5
117
+ value: 0.8485714285714285
118
+ name: Cosine Recall@5
119
+ - type: cosine_recall@10
120
+ value: 0.8985714285714286
121
+ name: Cosine Recall@10
122
+ - type: cosine_ndcg@10
123
+ value: 0.7849037198632751
124
+ name: Cosine Ndcg@10
125
+ - type: cosine_mrr@10
126
+ value: 0.7484699546485256
127
+ name: Cosine Mrr@10
128
+ - type: cosine_map@100
129
+ value: 0.7522833636034203
130
+ name: Cosine Map@100
131
+ - task:
132
+ type: information-retrieval
133
+ name: Information Retrieval
134
+ dataset:
135
+ name: dim 512
136
+ type: dim_512
137
+ metrics:
138
+ - type: cosine_accuracy@1
139
+ value: 0.6657142857142857
140
+ name: Cosine Accuracy@1
141
+ - type: cosine_accuracy@3
142
+ value: 0.8085714285714286
143
+ name: Cosine Accuracy@3
144
+ - type: cosine_accuracy@5
145
+ value: 0.8414285714285714
146
+ name: Cosine Accuracy@5
147
+ - type: cosine_accuracy@10
148
+ value: 0.8942857142857142
149
+ name: Cosine Accuracy@10
150
+ - type: cosine_precision@1
151
+ value: 0.6657142857142857
152
+ name: Cosine Precision@1
153
+ - type: cosine_precision@3
154
+ value: 0.26952380952380955
155
+ name: Cosine Precision@3
156
+ - type: cosine_precision@5
157
+ value: 0.16828571428571426
158
+ name: Cosine Precision@5
159
+ - type: cosine_precision@10
160
+ value: 0.08942857142857143
161
+ name: Cosine Precision@10
162
+ - type: cosine_recall@1
163
+ value: 0.6657142857142857
164
+ name: Cosine Recall@1
165
+ - type: cosine_recall@3
166
+ value: 0.8085714285714286
167
+ name: Cosine Recall@3
168
+ - type: cosine_recall@5
169
+ value: 0.8414285714285714
170
+ name: Cosine Recall@5
171
+ - type: cosine_recall@10
172
+ value: 0.8942857142857142
173
+ name: Cosine Recall@10
174
+ - type: cosine_ndcg@10
175
+ value: 0.7816751594389505
176
+ name: Cosine Ndcg@10
177
+ - type: cosine_mrr@10
178
+ value: 0.7455107709750564
179
+ name: Cosine Mrr@10
180
+ - type: cosine_map@100
181
+ value: 0.7495566091259342
182
+ name: Cosine Map@100
183
+ - task:
184
+ type: information-retrieval
185
+ name: Information Retrieval
186
+ dataset:
187
+ name: dim 256
188
+ type: dim_256
189
+ metrics:
190
+ - type: cosine_accuracy@1
191
+ value: 0.6528571428571428
192
+ name: Cosine Accuracy@1
193
+ - type: cosine_accuracy@3
194
+ value: 0.8042857142857143
195
+ name: Cosine Accuracy@3
196
+ - type: cosine_accuracy@5
197
+ value: 0.8357142857142857
198
+ name: Cosine Accuracy@5
199
+ - type: cosine_accuracy@10
200
+ value: 0.8957142857142857
201
+ name: Cosine Accuracy@10
202
+ - type: cosine_precision@1
203
+ value: 0.6528571428571428
204
+ name: Cosine Precision@1
205
+ - type: cosine_precision@3
206
+ value: 0.2680952380952381
207
+ name: Cosine Precision@3
208
+ - type: cosine_precision@5
209
+ value: 0.16714285714285712
210
+ name: Cosine Precision@5
211
+ - type: cosine_precision@10
212
+ value: 0.08957142857142857
213
+ name: Cosine Precision@10
214
+ - type: cosine_recall@1
215
+ value: 0.6528571428571428
216
+ name: Cosine Recall@1
217
+ - type: cosine_recall@3
218
+ value: 0.8042857142857143
219
+ name: Cosine Recall@3
220
+ - type: cosine_recall@5
221
+ value: 0.8357142857142857
222
+ name: Cosine Recall@5
223
+ - type: cosine_recall@10
224
+ value: 0.8957142857142857
225
+ name: Cosine Recall@10
226
+ - type: cosine_ndcg@10
227
+ value: 0.7751159904165151
228
+ name: Cosine Ndcg@10
229
+ - type: cosine_mrr@10
230
+ value: 0.7365447845804987
231
+ name: Cosine Mrr@10
232
+ - type: cosine_map@100
233
+ value: 0.7402062124507567
234
+ name: Cosine Map@100
235
+ - task:
236
+ type: information-retrieval
237
+ name: Information Retrieval
238
+ dataset:
239
+ name: dim 128
240
+ type: dim_128
241
+ metrics:
242
+ - type: cosine_accuracy@1
243
+ value: 0.6442857142857142
244
+ name: Cosine Accuracy@1
245
+ - type: cosine_accuracy@3
246
+ value: 0.7885714285714286
247
+ name: Cosine Accuracy@3
248
+ - type: cosine_accuracy@5
249
+ value: 0.83
250
+ name: Cosine Accuracy@5
251
+ - type: cosine_accuracy@10
252
+ value: 0.8857142857142857
253
+ name: Cosine Accuracy@10
254
+ - type: cosine_precision@1
255
+ value: 0.6442857142857142
256
+ name: Cosine Precision@1
257
+ - type: cosine_precision@3
258
+ value: 0.26285714285714284
259
+ name: Cosine Precision@3
260
+ - type: cosine_precision@5
261
+ value: 0.16599999999999998
262
+ name: Cosine Precision@5
263
+ - type: cosine_precision@10
264
+ value: 0.08857142857142856
265
+ name: Cosine Precision@10
266
+ - type: cosine_recall@1
267
+ value: 0.6442857142857142
268
+ name: Cosine Recall@1
269
+ - type: cosine_recall@3
270
+ value: 0.7885714285714286
271
+ name: Cosine Recall@3
272
+ - type: cosine_recall@5
273
+ value: 0.83
274
+ name: Cosine Recall@5
275
+ - type: cosine_recall@10
276
+ value: 0.8857142857142857
277
+ name: Cosine Recall@10
278
+ - type: cosine_ndcg@10
279
+ value: 0.7673388064771406
280
+ name: Cosine Ndcg@10
281
+ - type: cosine_mrr@10
282
+ value: 0.7293316326530613
283
+ name: Cosine Mrr@10
284
+ - type: cosine_map@100
285
+ value: 0.7335797814707157
286
+ name: Cosine Map@100
287
+ - task:
288
+ type: information-retrieval
289
+ name: Information Retrieval
290
+ dataset:
291
+ name: dim 64
292
+ type: dim_64
293
+ metrics:
294
+ - type: cosine_accuracy@1
295
+ value: 0.6057142857142858
296
+ name: Cosine Accuracy@1
297
+ - type: cosine_accuracy@3
298
+ value: 0.78
299
+ name: Cosine Accuracy@3
300
+ - type: cosine_accuracy@5
301
+ value: 0.8214285714285714
302
+ name: Cosine Accuracy@5
303
+ - type: cosine_accuracy@10
304
+ value: 0.8814285714285715
305
+ name: Cosine Accuracy@10
306
+ - type: cosine_precision@1
307
+ value: 0.6057142857142858
308
+ name: Cosine Precision@1
309
+ - type: cosine_precision@3
310
+ value: 0.26
311
+ name: Cosine Precision@3
312
+ - type: cosine_precision@5
313
+ value: 0.16428571428571426
314
+ name: Cosine Precision@5
315
+ - type: cosine_precision@10
316
+ value: 0.08814285714285712
317
+ name: Cosine Precision@10
318
+ - type: cosine_recall@1
319
+ value: 0.6057142857142858
320
+ name: Cosine Recall@1
321
+ - type: cosine_recall@3
322
+ value: 0.78
323
+ name: Cosine Recall@3
324
+ - type: cosine_recall@5
325
+ value: 0.8214285714285714
326
+ name: Cosine Recall@5
327
+ - type: cosine_recall@10
328
+ value: 0.8814285714285715
329
+ name: Cosine Recall@10
330
+ - type: cosine_ndcg@10
331
+ value: 0.7451487636214842
332
+ name: Cosine Ndcg@10
333
+ - type: cosine_mrr@10
334
+ value: 0.7013752834467117
335
+ name: Cosine Mrr@10
336
+ - type: cosine_map@100
337
+ value: 0.7052270125234881
338
+ name: Cosine Map@100
339
+ ---
340
+
341
+ # BGE base Financial Matryoshka
342
+
343
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
344
+
345
+ ## Model Details
346
+
347
+ ### Model Description
348
+ - **Model Type:** Sentence Transformer
349
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
350
+ - **Maximum Sequence Length:** 512 tokens
351
+ - **Output Dimensionality:** 768 dimensions
352
+ - **Similarity Function:** Cosine Similarity
353
+ - **Training Dataset:**
354
+ - json
355
+ - **Language:** en
356
+ - **License:** apache-2.0
357
+
358
+ ### Model Sources
359
+
360
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
361
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
362
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
363
+
364
+ ### Full Model Architecture
365
+
366
+ ```
367
+ SentenceTransformer(
368
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
369
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
370
+ (2): Normalize()
371
+ )
372
+ ```
373
+
374
+ ## Usage
375
+
376
+ ### Direct Usage (Sentence Transformers)
377
+
378
+ First install the Sentence Transformers library:
379
+
380
+ ```bash
381
+ pip install -U sentence-transformers
382
+ ```
383
+
384
+ Then you can load this model and run inference.
385
+ ```python
386
+ from sentence_transformers import SentenceTransformer
387
+
388
+ # Download from the 🤗 Hub
389
+ model = SentenceTransformer("CarlosElArtista/bge-base-financial-matryoshka")
390
+ # Run inference
391
+ sentences = [
392
+ "Symtuza (darunavir/C/FTC/TAF), a fixed dose combination product that includes cobicistat ('C'), emtricitabine ('FTC'), and tenofovir alafenamide ('TAF'), is commercialized by Janssen Sciences Ireland Unlimited Company.",
393
+ 'What are the primary drugs included in Symtuza and which company commercializes it?',
394
+ 'What was reported as the percentage revenue increase for the Asia Pacific & Latin America segment of NIKE from fiscal 2022 to fiscal 2023?',
395
+ ]
396
+ embeddings = model.encode(sentences)
397
+ print(embeddings.shape)
398
+ # [3, 768]
399
+
400
+ # Get the similarity scores for the embeddings
401
+ similarities = model.similarity(embeddings, embeddings)
402
+ print(similarities.shape)
403
+ # [3, 3]
404
+ ```
405
+
406
+ <!--
407
+ ### Direct Usage (Transformers)
408
+
409
+ <details><summary>Click to see the direct usage in Transformers</summary>
410
+
411
+ </details>
412
+ -->
413
+
414
+ <!--
415
+ ### Downstream Usage (Sentence Transformers)
416
+
417
+ You can finetune this model on your own dataset.
418
+
419
+ <details><summary>Click to expand</summary>
420
+
421
+ </details>
422
+ -->
423
+
424
+ <!--
425
+ ### Out-of-Scope Use
426
+
427
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
428
+ -->
429
+
430
+ ## Evaluation
431
+
432
+ ### Metrics
433
+
434
+ #### Information Retrieval
435
+
436
+ * Datasets: `dim_768`, `dim_512`, `dim_256`, `dim_128` and `dim_64`
437
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
438
+
439
+ | Metric | dim_768 | dim_512 | dim_256 | dim_128 | dim_64 |
440
+ |:--------------------|:-----------|:-----------|:-----------|:-----------|:-----------|
441
+ | cosine_accuracy@1 | 0.67 | 0.6657 | 0.6529 | 0.6443 | 0.6057 |
442
+ | cosine_accuracy@3 | 0.8071 | 0.8086 | 0.8043 | 0.7886 | 0.78 |
443
+ | cosine_accuracy@5 | 0.8486 | 0.8414 | 0.8357 | 0.83 | 0.8214 |
444
+ | cosine_accuracy@10 | 0.8986 | 0.8943 | 0.8957 | 0.8857 | 0.8814 |
445
+ | cosine_precision@1 | 0.67 | 0.6657 | 0.6529 | 0.6443 | 0.6057 |
446
+ | cosine_precision@3 | 0.269 | 0.2695 | 0.2681 | 0.2629 | 0.26 |
447
+ | cosine_precision@5 | 0.1697 | 0.1683 | 0.1671 | 0.166 | 0.1643 |
448
+ | cosine_precision@10 | 0.0899 | 0.0894 | 0.0896 | 0.0886 | 0.0881 |
449
+ | cosine_recall@1 | 0.67 | 0.6657 | 0.6529 | 0.6443 | 0.6057 |
450
+ | cosine_recall@3 | 0.8071 | 0.8086 | 0.8043 | 0.7886 | 0.78 |
451
+ | cosine_recall@5 | 0.8486 | 0.8414 | 0.8357 | 0.83 | 0.8214 |
452
+ | cosine_recall@10 | 0.8986 | 0.8943 | 0.8957 | 0.8857 | 0.8814 |
453
+ | **cosine_ndcg@10** | **0.7849** | **0.7817** | **0.7751** | **0.7673** | **0.7451** |
454
+ | cosine_mrr@10 | 0.7485 | 0.7455 | 0.7365 | 0.7293 | 0.7014 |
455
+ | cosine_map@100 | 0.7523 | 0.7496 | 0.7402 | 0.7336 | 0.7052 |
456
+
457
+ <!--
458
+ ## Bias, Risks and Limitations
459
+
460
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
461
+ -->
462
+
463
+ <!--
464
+ ### Recommendations
465
+
466
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
467
+ -->
468
+
469
+ ## Training Details
470
+
471
+ ### Training Dataset
472
+
473
+ #### json
474
+
475
+ * Dataset: json
476
+ * Size: 6,300 training samples
477
+ * Columns: <code>positive</code> and <code>anchor</code>
478
+ * Approximate statistics based on the first 1000 samples:
479
+ | | positive | anchor |
480
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
481
+ | type | string | string |
482
+ | details | <ul><li>min: 8 tokens</li><li>mean: 46.05 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 2 tokens</li><li>mean: 20.55 tokens</li><li>max: 51 tokens</li></ul> |
483
+ * Samples:
484
+ | positive | anchor |
485
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------|
486
+ | <code>The AMPTC for microinverters decreases by 25% each year beginning in 2030 and ending after 2032.</code> | <code>What is the trajectory of the AMPTC for microinverters starting in 2030?</code> |
487
+ | <code>results. Legal and Other Contingencies The Company is subject to various legal proceedings and claims that arise in the ordinary course of business, the outcomes of which are inherently uncertain. The Company records a liability when it is probable that a loss has been incurred and the amount is reasonably estimable, the determination of which requires significant judgment. Resolution of legal matters in a manner inconsistent with management’s expectations could have a material impact on the Company’s financial condition and operating results. Apple Inc. | 2023 Form 10-K | 25</code> | <code>What does the Company face in the ordinary course of business related to legal matters?</code> |
488
+ | <code>In 2023, the company recorded other operating charges of $1,951 million.</code> | <code>What was the total amount of other operating charges recorded by the company in 2023?</code> |
489
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
490
+ ```json
491
+ {
492
+ "loss": "MultipleNegativesRankingLoss",
493
+ "matryoshka_dims": [
494
+ 768,
495
+ 512,
496
+ 256,
497
+ 128,
498
+ 64
499
+ ],
500
+ "matryoshka_weights": [
501
+ 1,
502
+ 1,
503
+ 1,
504
+ 1,
505
+ 1
506
+ ],
507
+ "n_dims_per_step": -1
508
+ }
509
+ ```
510
+
511
+ ### Training Hyperparameters
512
+ #### Non-Default Hyperparameters
513
+
514
+ - `eval_strategy`: epoch
515
+ - `per_device_train_batch_size`: 4
516
+ - `per_device_eval_batch_size`: 4
517
+ - `gradient_accumulation_steps`: 4
518
+ - `learning_rate`: 2e-05
519
+ - `num_train_epochs`: 4
520
+ - `lr_scheduler_type`: cosine
521
+ - `warmup_ratio`: 0.1
522
+ - `bf16`: True
523
+ - `tf32`: False
524
+ - `load_best_model_at_end`: True
525
+ - `optim`: adamw_torch_fused
526
+ - `batch_sampler`: no_duplicates
527
+
528
+ #### All Hyperparameters
529
+ <details><summary>Click to expand</summary>
530
+
531
+ - `overwrite_output_dir`: False
532
+ - `do_predict`: False
533
+ - `eval_strategy`: epoch
534
+ - `prediction_loss_only`: True
535
+ - `per_device_train_batch_size`: 4
536
+ - `per_device_eval_batch_size`: 4
537
+ - `per_gpu_train_batch_size`: None
538
+ - `per_gpu_eval_batch_size`: None
539
+ - `gradient_accumulation_steps`: 4
540
+ - `eval_accumulation_steps`: None
541
+ - `torch_empty_cache_steps`: None
542
+ - `learning_rate`: 2e-05
543
+ - `weight_decay`: 0.0
544
+ - `adam_beta1`: 0.9
545
+ - `adam_beta2`: 0.999
546
+ - `adam_epsilon`: 1e-08
547
+ - `max_grad_norm`: 1.0
548
+ - `num_train_epochs`: 4
549
+ - `max_steps`: -1
550
+ - `lr_scheduler_type`: cosine
551
+ - `lr_scheduler_kwargs`: {}
552
+ - `warmup_ratio`: 0.1
553
+ - `warmup_steps`: 0
554
+ - `log_level`: passive
555
+ - `log_level_replica`: warning
556
+ - `log_on_each_node`: True
557
+ - `logging_nan_inf_filter`: True
558
+ - `save_safetensors`: True
559
+ - `save_on_each_node`: False
560
+ - `save_only_model`: False
561
+ - `restore_callback_states_from_checkpoint`: False
562
+ - `no_cuda`: False
563
+ - `use_cpu`: False
564
+ - `use_mps_device`: False
565
+ - `seed`: 42
566
+ - `data_seed`: None
567
+ - `jit_mode_eval`: False
568
+ - `use_ipex`: False
569
+ - `bf16`: True
570
+ - `fp16`: False
571
+ - `fp16_opt_level`: O1
572
+ - `half_precision_backend`: auto
573
+ - `bf16_full_eval`: False
574
+ - `fp16_full_eval`: False
575
+ - `tf32`: False
576
+ - `local_rank`: 0
577
+ - `ddp_backend`: None
578
+ - `tpu_num_cores`: None
579
+ - `tpu_metrics_debug`: False
580
+ - `debug`: []
581
+ - `dataloader_drop_last`: False
582
+ - `dataloader_num_workers`: 0
583
+ - `dataloader_prefetch_factor`: None
584
+ - `past_index`: -1
585
+ - `disable_tqdm`: False
586
+ - `remove_unused_columns`: True
587
+ - `label_names`: None
588
+ - `load_best_model_at_end`: True
589
+ - `ignore_data_skip`: False
590
+ - `fsdp`: []
591
+ - `fsdp_min_num_params`: 0
592
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
593
+ - `fsdp_transformer_layer_cls_to_wrap`: None
594
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
595
+ - `deepspeed`: None
596
+ - `label_smoothing_factor`: 0.0
597
+ - `optim`: adamw_torch_fused
598
+ - `optim_args`: None
599
+ - `adafactor`: False
600
+ - `group_by_length`: False
601
+ - `length_column_name`: length
602
+ - `ddp_find_unused_parameters`: None
603
+ - `ddp_bucket_cap_mb`: None
604
+ - `ddp_broadcast_buffers`: False
605
+ - `dataloader_pin_memory`: True
606
+ - `dataloader_persistent_workers`: False
607
+ - `skip_memory_metrics`: True
608
+ - `use_legacy_prediction_loop`: False
609
+ - `push_to_hub`: False
610
+ - `resume_from_checkpoint`: None
611
+ - `hub_model_id`: None
612
+ - `hub_strategy`: every_save
613
+ - `hub_private_repo`: None
614
+ - `hub_always_push`: False
615
+ - `gradient_checkpointing`: False
616
+ - `gradient_checkpointing_kwargs`: None
617
+ - `include_inputs_for_metrics`: False
618
+ - `include_for_metrics`: []
619
+ - `eval_do_concat_batches`: True
620
+ - `fp16_backend`: auto
621
+ - `push_to_hub_model_id`: None
622
+ - `push_to_hub_organization`: None
623
+ - `mp_parameters`:
624
+ - `auto_find_batch_size`: False
625
+ - `full_determinism`: False
626
+ - `torchdynamo`: None
627
+ - `ray_scope`: last
628
+ - `ddp_timeout`: 1800
629
+ - `torch_compile`: False
630
+ - `torch_compile_backend`: None
631
+ - `torch_compile_mode`: None
632
+ - `dispatch_batches`: None
633
+ - `split_batches`: None
634
+ - `include_tokens_per_second`: False
635
+ - `include_num_input_tokens_seen`: False
636
+ - `neftune_noise_alpha`: None
637
+ - `optim_target_modules`: None
638
+ - `batch_eval_metrics`: False
639
+ - `eval_on_start`: False
640
+ - `use_liger_kernel`: False
641
+ - `eval_use_gather_object`: False
642
+ - `average_tokens_across_devices`: False
643
+ - `prompts`: None
644
+ - `batch_sampler`: no_duplicates
645
+ - `multi_dataset_batch_sampler`: proportional
646
+
647
+ </details>
648
+
649
+ ### Training Logs
650
+ <details><summary>Click to expand</summary>
651
+
652
+ | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
653
+ |:-------:|:--------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|
654
+ | 0.0254 | 10 | 0.3873 | - | - | - | - | - |
655
+ | 0.0508 | 20 | 0.1907 | - | - | - | - | - |
656
+ | 0.0762 | 30 | 0.3031 | - | - | - | - | - |
657
+ | 0.1016 | 40 | 0.3314 | - | - | - | - | - |
658
+ | 0.1270 | 50 | 0.3452 | - | - | - | - | - |
659
+ | 0.1524 | 60 | 0.1831 | - | - | - | - | - |
660
+ | 0.1778 | 70 | 0.1286 | - | - | - | - | - |
661
+ | 0.2032 | 80 | 0.1162 | - | - | - | - | - |
662
+ | 0.2286 | 90 | 0.1464 | - | - | - | - | - |
663
+ | 0.2540 | 100 | 0.0409 | - | - | - | - | - |
664
+ | 0.2794 | 110 | 0.0886 | - | - | - | - | - |
665
+ | 0.3048 | 120 | 0.0964 | - | - | - | - | - |
666
+ | 0.3302 | 130 | 0.175 | - | - | - | - | - |
667
+ | 0.3556 | 140 | 0.1102 | - | - | - | - | - |
668
+ | 0.3810 | 150 | 0.0705 | - | - | - | - | - |
669
+ | 0.4063 | 160 | 0.0892 | - | - | - | - | - |
670
+ | 0.4317 | 170 | 0.1246 | - | - | - | - | - |
671
+ | 0.4571 | 180 | 0.0924 | - | - | - | - | - |
672
+ | 0.4825 | 190 | 0.05 | - | - | - | - | - |
673
+ | 0.5079 | 200 | 0.0676 | - | - | - | - | - |
674
+ | 0.5333 | 210 | 0.0746 | - | - | - | - | - |
675
+ | 0.5587 | 220 | 0.2014 | - | - | - | - | - |
676
+ | 0.5841 | 230 | 0.0568 | - | - | - | - | - |
677
+ | 0.6095 | 240 | 0.118 | - | - | - | - | - |
678
+ | 0.6349 | 250 | 0.0833 | - | - | - | - | - |
679
+ | 0.6603 | 260 | 0.1091 | - | - | - | - | - |
680
+ | 0.6857 | 270 | 0.1108 | - | - | - | - | - |
681
+ | 0.7111 | 280 | 0.1026 | - | - | - | - | - |
682
+ | 0.7365 | 290 | 0.1485 | - | - | - | - | - |
683
+ | 0.7619 | 300 | 0.0888 | - | - | - | - | - |
684
+ | 0.7873 | 310 | 0.0366 | - | - | - | - | - |
685
+ | 0.8127 | 320 | 0.0717 | - | - | - | - | - |
686
+ | 0.8381 | 330 | 0.0703 | - | - | - | - | - |
687
+ | 0.8635 | 340 | 0.0531 | - | - | - | - | - |
688
+ | 0.8889 | 350 | 0.0488 | - | - | - | - | - |
689
+ | 0.9143 | 360 | 0.0321 | - | - | - | - | - |
690
+ | 0.9397 | 370 | 0.1364 | - | - | - | - | - |
691
+ | 0.9651 | 380 | 0.2325 | - | - | - | - | - |
692
+ | 0.9905 | 390 | 0.0346 | - | - | - | - | - |
693
+ | 1.0 | 394 | - | 0.7833 | 0.7757 | 0.7692 | 0.7525 | 0.7314 |
694
+ | 1.0152 | 400 | 0.0742 | - | - | - | - | - |
695
+ | 1.0406 | 410 | 0.0147 | - | - | - | - | - |
696
+ | 1.0660 | 420 | 0.0777 | - | - | - | - | - |
697
+ | 1.0914 | 430 | 0.0353 | - | - | - | - | - |
698
+ | 1.1168 | 440 | 0.0093 | - | - | - | - | - |
699
+ | 1.1422 | 450 | 0.1484 | - | - | - | - | - |
700
+ | 1.1676 | 460 | 0.0167 | - | - | - | - | - |
701
+ | 1.1930 | 470 | 0.0039 | - | - | - | - | - |
702
+ | 1.2184 | 480 | 0.007 | - | - | - | - | - |
703
+ | 1.2438 | 490 | 0.0043 | - | - | - | - | - |
704
+ | 1.2692 | 500 | 0.0156 | - | - | - | - | - |
705
+ | 1.2946 | 510 | 0.0519 | - | - | - | - | - |
706
+ | 1.32 | 520 | 0.0163 | - | - | - | - | - |
707
+ | 1.3454 | 530 | 0.0214 | - | - | - | - | - |
708
+ | 1.3708 | 540 | 0.0025 | - | - | - | - | - |
709
+ | 1.3962 | 550 | 0.0129 | - | - | - | - | - |
710
+ | 1.4216 | 560 | 0.0045 | - | - | - | - | - |
711
+ | 1.4470 | 570 | 0.0025 | - | - | - | - | - |
712
+ | 1.4724 | 580 | 0.0023 | - | - | - | - | - |
713
+ | 1.4978 | 590 | 0.0114 | - | - | - | - | - |
714
+ | 1.5232 | 600 | 0.0636 | - | - | - | - | - |
715
+ | 1.5486 | 610 | 0.0066 | - | - | - | - | - |
716
+ | 1.5740 | 620 | 0.0112 | - | - | - | - | - |
717
+ | 1.5994 | 630 | 0.0087 | - | - | - | - | - |
718
+ | 1.6248 | 640 | 0.0026 | - | - | - | - | - |
719
+ | 1.6502 | 650 | 0.017 | - | - | - | - | - |
720
+ | 1.6756 | 660 | 0.0741 | - | - | - | - | - |
721
+ | 1.7010 | 670 | 0.0041 | - | - | - | - | - |
722
+ | 1.7263 | 680 | 0.0339 | - | - | - | - | - |
723
+ | 1.7517 | 690 | 0.003 | - | - | - | - | - |
724
+ | 1.7771 | 700 | 0.0052 | - | - | - | - | - |
725
+ | 1.8025 | 710 | 0.0464 | - | - | - | - | - |
726
+ | 1.8279 | 720 | 0.0015 | - | - | - | - | - |
727
+ | 1.8533 | 730 | 0.0169 | - | - | - | - | - |
728
+ | 1.8787 | 740 | 0.0178 | - | - | - | - | - |
729
+ | 1.9041 | 750 | 0.0033 | - | - | - | - | - |
730
+ | 1.9295 | 760 | 0.0165 | - | - | - | - | - |
731
+ | 1.9549 | 770 | 0.0091 | - | - | - | - | - |
732
+ | 1.9803 | 780 | 0.1162 | - | - | - | - | - |
733
+ | 2.0 | 788 | - | 0.7849 | 0.7820 | 0.7764 | 0.7661 | 0.7469 |
734
+ | 2.0051 | 790 | 0.0077 | - | - | - | - | - |
735
+ | 2.0305 | 800 | 0.0024 | - | - | - | - | - |
736
+ | 2.0559 | 810 | 0.0025 | - | - | - | - | - |
737
+ | 2.0813 | 820 | 0.0032 | - | - | - | - | - |
738
+ | 2.1067 | 830 | 0.0022 | - | - | - | - | - |
739
+ | 2.1321 | 840 | 0.0428 | - | - | - | - | - |
740
+ | 2.1575 | 850 | 0.0027 | - | - | - | - | - |
741
+ | 2.1829 | 860 | 0.0015 | - | - | - | - | - |
742
+ | 2.2083 | 870 | 0.0028 | - | - | - | - | - |
743
+ | 2.2337 | 880 | 0.0006 | - | - | - | - | - |
744
+ | 2.2590 | 890 | 0.0005 | - | - | - | - | - |
745
+ | 2.2844 | 900 | 0.0025 | - | - | - | - | - |
746
+ | 2.3098 | 910 | 0.002 | - | - | - | - | - |
747
+ | 2.3352 | 920 | 0.002 | - | - | - | - | - |
748
+ | 2.3606 | 930 | 0.0105 | - | - | - | - | - |
749
+ | 2.3860 | 940 | 0.0061 | - | - | - | - | - |
750
+ | 2.4114 | 950 | 0.0017 | - | - | - | - | - |
751
+ | 2.4368 | 960 | 0.0009 | - | - | - | - | - |
752
+ | 2.4622 | 970 | 0.0007 | - | - | - | - | - |
753
+ | 2.4876 | 980 | 0.001 | - | - | - | - | - |
754
+ | 2.5130 | 990 | 0.0008 | - | - | - | - | - |
755
+ | 2.5384 | 1000 | 0.044 | - | - | - | - | - |
756
+ | 2.5638 | 1010 | 0.0012 | - | - | - | - | - |
757
+ | 2.5892 | 1020 | 0.0103 | - | - | - | - | - |
758
+ | 2.6146 | 1030 | 0.0003 | - | - | - | - | - |
759
+ | 2.64 | 1040 | 0.0005 | - | - | - | - | - |
760
+ | 2.6654 | 1050 | 0.0972 | - | - | - | - | - |
761
+ | 2.6908 | 1060 | 0.0011 | - | - | - | - | - |
762
+ | 2.7162 | 1070 | 0.0093 | - | - | - | - | - |
763
+ | 2.7416 | 1080 | 0.0028 | - | - | - | - | - |
764
+ | 2.7670 | 1090 | 0.0004 | - | - | - | - | - |
765
+ | 2.7924 | 1100 | 0.0231 | - | - | - | - | - |
766
+ | 2.8178 | 1110 | 0.0021 | - | - | - | - | - |
767
+ | 2.8432 | 1120 | 0.0013 | - | - | - | - | - |
768
+ | 2.8686 | 1130 | 0.0012 | - | - | - | - | - |
769
+ | 2.8940 | 1140 | 0.002 | - | - | - | - | - |
770
+ | 2.9194 | 1150 | 0.001 | - | - | - | - | - |
771
+ | 2.9448 | 1160 | 0.007 | - | - | - | - | - |
772
+ | 2.9702 | 1170 | 0.018 | - | - | - | - | - |
773
+ | 2.9956 | 1180 | 0.001 | - | - | - | - | - |
774
+ | **3.0** | **1182** | **-** | **0.7832** | **0.7823** | **0.7754** | **0.7682** | **0.744** |
775
+ | 3.0203 | 1190 | 0.0028 | - | - | - | - | - |
776
+ | 3.0457 | 1200 | 0.0005 | - | - | - | - | - |
777
+ | 3.0711 | 1210 | 0.0007 | - | - | - | - | - |
778
+ | 3.0965 | 1220 | 0.0008 | - | - | - | - | - |
779
+ | 3.1219 | 1230 | 0.0123 | - | - | - | - | - |
780
+ | 3.1473 | 1240 | 0.0014 | - | - | - | - | - |
781
+ | 3.1727 | 1250 | 0.0005 | - | - | - | - | - |
782
+ | 3.1981 | 1260 | 0.0003 | - | - | - | - | - |
783
+ | 3.2235 | 1270 | 0.0006 | - | - | - | - | - |
784
+ | 3.2489 | 1280 | 0.0004 | - | - | - | - | - |
785
+ | 3.2743 | 1290 | 0.0007 | - | - | - | - | - |
786
+ | 3.2997 | 1300 | 0.0011 | - | - | - | - | - |
787
+ | 3.3251 | 1310 | 0.0006 | - | - | - | - | - |
788
+ | 3.3505 | 1320 | 0.0019 | - | - | - | - | - |
789
+ | 3.3759 | 1330 | 0.0006 | - | - | - | - | - |
790
+ | 3.4013 | 1340 | 0.0011 | - | - | - | - | - |
791
+ | 3.4267 | 1350 | 0.0006 | - | - | - | - | - |
792
+ | 3.4521 | 1360 | 0.0006 | - | - | - | - | - |
793
+ | 3.4775 | 1370 | 0.0004 | - | - | - | - | - |
794
+ | 3.5029 | 1380 | 0.0007 | - | - | - | - | - |
795
+ | 3.5283 | 1390 | 0.0383 | - | - | - | - | - |
796
+ | 3.5537 | 1400 | 0.0007 | - | - | - | - | - |
797
+ | 3.5790 | 1410 | 0.0019 | - | - | - | - | - |
798
+ | 3.6044 | 1420 | 0.0038 | - | - | - | - | - |
799
+ | 3.6298 | 1430 | 0.0007 | - | - | - | - | - |
800
+ | 3.6552 | 1440 | 0.0463 | - | - | - | - | - |
801
+ | 3.6806 | 1450 | 0.0373 | - | - | - | - | - |
802
+ | 3.7060 | 1460 | 0.0007 | - | - | - | - | - |
803
+ | 3.7314 | 1470 | 0.0022 | - | - | - | - | - |
804
+ | 3.7568 | 1480 | 0.0005 | - | - | - | - | - |
805
+ | 3.7822 | 1490 | 0.0007 | - | - | - | - | - |
806
+ | 3.8076 | 1500 | 0.0177 | - | - | - | - | - |
807
+ | 3.8330 | 1510 | 0.0006 | - | - | - | - | - |
808
+ | 3.8584 | 1520 | 0.0009 | - | - | - | - | - |
809
+ | 3.8838 | 1530 | 0.0012 | - | - | - | - | - |
810
+ | 3.9092 | 1540 | 0.0009 | - | - | - | - | - |
811
+ | 3.9346 | 1550 | 0.0012 | - | - | - | - | - |
812
+ | 3.96 | 1560 | 0.0004 | - | - | - | - | - |
813
+ | 3.9854 | 1570 | 0.0064 | - | - | - | - | - |
814
+ | 3.9905 | 1572 | - | 0.7849 | 0.7817 | 0.7751 | 0.7673 | 0.7451 |
815
+
816
+ * The bold row denotes the saved checkpoint.
817
+ </details>
818
+
819
+ ### Framework Versions
820
+ - Python: 3.12.8
821
+ - Sentence Transformers: 3.3.1
822
+ - Transformers: 4.47.1
823
+ - PyTorch: 2.5.1+cu124
824
+ - Accelerate: 1.2.1
825
+ - Datasets: 3.2.0
826
+ - Tokenizers: 0.21.0
827
+
828
+ ## Citation
829
+
830
+ ### BibTeX
831
+
832
+ #### Sentence Transformers
833
+ ```bibtex
834
+ @inproceedings{reimers-2019-sentence-bert,
835
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
836
+ author = "Reimers, Nils and Gurevych, Iryna",
837
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
838
+ month = "11",
839
+ year = "2019",
840
+ publisher = "Association for Computational Linguistics",
841
+ url = "https://arxiv.org/abs/1908.10084",
842
+ }
843
+ ```
844
+
845
+ #### MatryoshkaLoss
846
+ ```bibtex
847
+ @misc{kusupati2024matryoshka,
848
+ title={Matryoshka Representation Learning},
849
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
850
+ year={2024},
851
+ eprint={2205.13147},
852
+ archivePrefix={arXiv},
853
+ primaryClass={cs.LG}
854
+ }
855
+ ```
856
+
857
+ #### MultipleNegativesRankingLoss
858
+ ```bibtex
859
+ @misc{henderson2017efficient,
860
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
861
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
862
+ year={2017},
863
+ eprint={1705.00652},
864
+ archivePrefix={arXiv},
865
+ primaryClass={cs.CL}
866
+ }
867
+ ```
868
+
869
+ <!--
870
+ ## Glossary
871
+
872
+ *Clearly define terms in order to be accessible across audiences.*
873
+ -->
874
+
875
+ <!--
876
+ ## Model Card Authors
877
+
878
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
879
+ -->
880
+
881
+ <!--
882
+ ## Model Card Contact
883
+
884
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
885
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.47.1",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.47.1",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1d622a1c180323a995d3b8ac0c14df0f8c091a66df8b502a6a5f9e8912517331
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff