rezarahim commited on
Commit
e5d453c
·
verified ·
1 Parent(s): 7545d8c

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,604 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:178
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: BAAI/bge-base-en-v1.5
10
+ widget:
11
+ - source_sentence: Where can investors find more information about NVIDIA's financial
12
+ information and company updates?
13
+ sentences:
14
+ - ' The potential risks include restrictions on sales of products containing certain
15
+ components made by Micron, restrictions on receiving supply of components, parts,
16
+ or services from Taiwan, increased scrutiny from shareholders, regulators, and
17
+ others regarding corporate sustainability practices, and failure to meet evolving
18
+ shareholder, regulator, or other industry stakeholder expectations, which could
19
+ result in additional costs, reputational harm, and loss of customers and suppliers.'
20
+ - ' Investors and others can find more information about NVIDIA''s financial information
21
+ and company updates on the company''s investor relations website, through press
22
+ releases, SEC filings, public conference calls and webcasts, as well as on the
23
+ company''s social media channels, including Twitter, the NVIDIA Corporate Blog,
24
+ Facebook, LinkedIn, Instagram, and YouTube.'
25
+ - ' The text mentions the following forms and agreements: Officers'' Certificate,
26
+ Form of Note (with various years), Form of Indemnity Agreement, Amended and Restated
27
+ 2007 Equity Incentive Plan, Non-Employee Director Deferred Restricted Stock Unit
28
+ Grant Notice and Deferred Restricted Stock Unit Agreement, Non-Employee Director
29
+ Restricted Stock Unit Grant Notice and Restricted Stock Unit Agreement, Global
30
+ Performance-Based Restricted Stock Unit Grant Notice and Performance-Based Restricted
31
+ Stock Unit Agreement, Global Restricted Stock Unit Grant Notice and Global Restricted
32
+ Stock Unit Agreement, and various Schedules and Exhibits (such as 2.1, 3.1, 4.1,
33
+ 4.2, 10.1, 10.2, 10.26, and 10.27).'
34
+ - source_sentence: What are the potential consequences if regulators in China conclude
35
+ that NVIDIA has failed to fulfill its commitments or has violated applicable law
36
+ in China?
37
+ sentences:
38
+ - ' The company''s share repurchase program aims to offset dilution from shares
39
+ issued to employees.'
40
+ - ' Ms. Shoquist served as Senior Vice President and General Manager of the Electro-Optics
41
+ business at Coherent, Inc., and previously worked at Quantum Corp. as President
42
+ of the Personal Computer Hard Disk Drive Division, and at Hewlett-Packard.'
43
+ - ' If regulators in China conclude that NVIDIA has failed to fulfill its commitments
44
+ or has violated applicable law in China, the company could be subject to various
45
+ penalties or restrictions on its ability to conduct its business, which could
46
+ have a material and adverse impact on its business, operating results, and financial
47
+ condition.'
48
+ - source_sentence: What percentage of the company's revenue was attributed to sales
49
+ to customers outside of the United States in fiscal year 2024?
50
+ sentences:
51
+ - ' NVIDIA reports its business results in two segments: the Compute & Networking
52
+ segment and the Graphics segment.'
53
+ - ' The company expects to use its existing cash, cash equivalents, and marketable
54
+ securities, as well as the cash generated by its operations, to fund its capital
55
+ investments of approximately $3.5 billion to $4.0 billion related to property
56
+ and equipment during fiscal year 2025.'
57
+ - ' 56% of the company''s total revenue in fiscal year 2024 was attributed to sales
58
+ to customers outside of the United States.'
59
+ - source_sentence: What is the net income per share of NVIDIA Corporation for the
60
+ year ended January 29, 2023?
61
+ sentences:
62
+ - ' 6% of the company''s workforce in the United States is composed of Black or
63
+ African American employees.'
64
+ - ' The net income per share of NVIDIA Corporation for the year ended January 29,
65
+ 2023 is $12.05 for basic and $11.93 for diluted.'
66
+ - ' The company may face potential risks and challenges such as increased expenses,
67
+ substantial expenditures and time spent to fully resume operations, disruption
68
+ to product development or operations due to employees being called-up for active
69
+ military duty, and potential impact on future product development, operations,
70
+ and revenue. Additionally, the company may also experience interruptions or delays
71
+ in services from third-party providers, which could impair its ability to provide
72
+ its products and services and harm its business.'
73
+ - source_sentence: What percentage of the company's accounts receivable balance as
74
+ of January 28, 2024, was accounted for by two customers?
75
+ sentences:
76
+ - ' The change in equipment and assembly and test equipment resulted in a benefit
77
+ of $135 million in operating income and $114 million in net income, or $0.05 per
78
+ both basic and diluted share, for the fiscal year ended January 28, 2024.'
79
+ - ' The estimates of deferred tax assets and liabilities may change based on added
80
+ certainty or finality to an anticipated outcome, changes in accounting standards
81
+ or tax laws in the U.S. or foreign jurisdictions where the company operates, or
82
+ changes in other facts or circumstances.'
83
+ - ' 24% and 11%, which is a total of 35%.'
84
+ pipeline_tag: sentence-similarity
85
+ library_name: sentence-transformers
86
+ metrics:
87
+ - cosine_accuracy@1
88
+ - cosine_accuracy@3
89
+ - cosine_accuracy@5
90
+ - cosine_accuracy@10
91
+ - cosine_precision@1
92
+ - cosine_precision@3
93
+ - cosine_precision@5
94
+ - cosine_precision@10
95
+ - cosine_recall@1
96
+ - cosine_recall@3
97
+ - cosine_recall@5
98
+ - cosine_recall@10
99
+ - cosine_ndcg@10
100
+ - cosine_mrr@10
101
+ - cosine_map@100
102
+ - dot_accuracy@1
103
+ - dot_accuracy@3
104
+ - dot_accuracy@5
105
+ - dot_accuracy@10
106
+ - dot_precision@1
107
+ - dot_precision@3
108
+ - dot_precision@5
109
+ - dot_precision@10
110
+ - dot_recall@1
111
+ - dot_recall@3
112
+ - dot_recall@5
113
+ - dot_recall@10
114
+ - dot_ndcg@10
115
+ - dot_mrr@10
116
+ - dot_map@100
117
+ model-index:
118
+ - name: SentenceTransformer based on BAAI/bge-base-en-v1.5
119
+ results:
120
+ - task:
121
+ type: information-retrieval
122
+ name: Information Retrieval
123
+ dataset:
124
+ name: bge base en
125
+ type: bge-base-en
126
+ metrics:
127
+ - type: cosine_accuracy@1
128
+ value: 0.9269662921348315
129
+ name: Cosine Accuracy@1
130
+ - type: cosine_accuracy@3
131
+ value: 0.9831460674157303
132
+ name: Cosine Accuracy@3
133
+ - type: cosine_accuracy@5
134
+ value: 0.9943820224719101
135
+ name: Cosine Accuracy@5
136
+ - type: cosine_accuracy@10
137
+ value: 1.0
138
+ name: Cosine Accuracy@10
139
+ - type: cosine_precision@1
140
+ value: 0.9269662921348315
141
+ name: Cosine Precision@1
142
+ - type: cosine_precision@3
143
+ value: 0.3277153558052434
144
+ name: Cosine Precision@3
145
+ - type: cosine_precision@5
146
+ value: 0.198876404494382
147
+ name: Cosine Precision@5
148
+ - type: cosine_precision@10
149
+ value: 0.09999999999999998
150
+ name: Cosine Precision@10
151
+ - type: cosine_recall@1
152
+ value: 0.9269662921348315
153
+ name: Cosine Recall@1
154
+ - type: cosine_recall@3
155
+ value: 0.9831460674157303
156
+ name: Cosine Recall@3
157
+ - type: cosine_recall@5
158
+ value: 0.9943820224719101
159
+ name: Cosine Recall@5
160
+ - type: cosine_recall@10
161
+ value: 1.0
162
+ name: Cosine Recall@10
163
+ - type: cosine_ndcg@10
164
+ value: 0.9682702490705566
165
+ name: Cosine Ndcg@10
166
+ - type: cosine_mrr@10
167
+ value: 0.9575842696629214
168
+ name: Cosine Mrr@10
169
+ - type: cosine_map@100
170
+ value: 0.9575842696629213
171
+ name: Cosine Map@100
172
+ - type: dot_accuracy@1
173
+ value: 0.9269662921348315
174
+ name: Dot Accuracy@1
175
+ - type: dot_accuracy@3
176
+ value: 0.9831460674157303
177
+ name: Dot Accuracy@3
178
+ - type: dot_accuracy@5
179
+ value: 0.9943820224719101
180
+ name: Dot Accuracy@5
181
+ - type: dot_accuracy@10
182
+ value: 1.0
183
+ name: Dot Accuracy@10
184
+ - type: dot_precision@1
185
+ value: 0.9269662921348315
186
+ name: Dot Precision@1
187
+ - type: dot_precision@3
188
+ value: 0.3277153558052434
189
+ name: Dot Precision@3
190
+ - type: dot_precision@5
191
+ value: 0.198876404494382
192
+ name: Dot Precision@5
193
+ - type: dot_precision@10
194
+ value: 0.09999999999999998
195
+ name: Dot Precision@10
196
+ - type: dot_recall@1
197
+ value: 0.9269662921348315
198
+ name: Dot Recall@1
199
+ - type: dot_recall@3
200
+ value: 0.9831460674157303
201
+ name: Dot Recall@3
202
+ - type: dot_recall@5
203
+ value: 0.9943820224719101
204
+ name: Dot Recall@5
205
+ - type: dot_recall@10
206
+ value: 1.0
207
+ name: Dot Recall@10
208
+ - type: dot_ndcg@10
209
+ value: 0.9682702490705566
210
+ name: Dot Ndcg@10
211
+ - type: dot_mrr@10
212
+ value: 0.9575842696629214
213
+ name: Dot Mrr@10
214
+ - type: dot_map@100
215
+ value: 0.9575842696629213
216
+ name: Dot Map@100
217
+ ---
218
+
219
+ # SentenceTransformer based on BAAI/bge-base-en-v1.5
220
+
221
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on the train dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
222
+
223
+ ## Model Details
224
+
225
+ ### Model Description
226
+ - **Model Type:** Sentence Transformer
227
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
228
+ - **Maximum Sequence Length:** 512 tokens
229
+ - **Output Dimensionality:** 768 tokens
230
+ - **Similarity Function:** Cosine Similarity
231
+ - **Training Dataset:**
232
+ - train
233
+ <!-- - **Language:** Unknown -->
234
+ <!-- - **License:** Unknown -->
235
+
236
+ ### Model Sources
237
+
238
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
239
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
240
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
241
+
242
+ ### Full Model Architecture
243
+
244
+ ```
245
+ SentenceTransformer(
246
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
247
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
248
+ (2): Normalize()
249
+ )
250
+ ```
251
+
252
+ ## Usage
253
+
254
+ ### Direct Usage (Sentence Transformers)
255
+
256
+ First install the Sentence Transformers library:
257
+
258
+ ```bash
259
+ pip install -U sentence-transformers
260
+ ```
261
+
262
+ Then you can load this model and run inference.
263
+ ```python
264
+ from sentence_transformers import SentenceTransformer
265
+
266
+ # Download from the 🤗 Hub
267
+ model = SentenceTransformer("rezarahim/bge-finetuned-detail")
268
+ # Run inference
269
+ sentences = [
270
+ "What percentage of the company's accounts receivable balance as of January 28, 2024, was accounted for by two customers?",
271
+ ' 24% and 11%, which is a total of 35%.',
272
+ ' The change in equipment and assembly and test equipment resulted in a benefit of $135 million in operating income and $114 million in net income, or $0.05 per both basic and diluted share, for the fiscal year ended January 28, 2024.',
273
+ ]
274
+ embeddings = model.encode(sentences)
275
+ print(embeddings.shape)
276
+ # [3, 768]
277
+
278
+ # Get the similarity scores for the embeddings
279
+ similarities = model.similarity(embeddings, embeddings)
280
+ print(similarities.shape)
281
+ # [3, 3]
282
+ ```
283
+
284
+ <!--
285
+ ### Direct Usage (Transformers)
286
+
287
+ <details><summary>Click to see the direct usage in Transformers</summary>
288
+
289
+ </details>
290
+ -->
291
+
292
+ <!--
293
+ ### Downstream Usage (Sentence Transformers)
294
+
295
+ You can finetune this model on your own dataset.
296
+
297
+ <details><summary>Click to expand</summary>
298
+
299
+ </details>
300
+ -->
301
+
302
+ <!--
303
+ ### Out-of-Scope Use
304
+
305
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
306
+ -->
307
+
308
+ ## Evaluation
309
+
310
+ ### Metrics
311
+
312
+ #### Information Retrieval
313
+ * Dataset: `bge-base-en`
314
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
315
+
316
+ | Metric | Value |
317
+ |:--------------------|:-----------|
318
+ | cosine_accuracy@1 | 0.927 |
319
+ | cosine_accuracy@3 | 0.9831 |
320
+ | cosine_accuracy@5 | 0.9944 |
321
+ | cosine_accuracy@10 | 1.0 |
322
+ | cosine_precision@1 | 0.927 |
323
+ | cosine_precision@3 | 0.3277 |
324
+ | cosine_precision@5 | 0.1989 |
325
+ | cosine_precision@10 | 0.1 |
326
+ | cosine_recall@1 | 0.927 |
327
+ | cosine_recall@3 | 0.9831 |
328
+ | cosine_recall@5 | 0.9944 |
329
+ | cosine_recall@10 | 1.0 |
330
+ | cosine_ndcg@10 | 0.9683 |
331
+ | cosine_mrr@10 | 0.9576 |
332
+ | **cosine_map@100** | **0.9576** |
333
+ | dot_accuracy@1 | 0.927 |
334
+ | dot_accuracy@3 | 0.9831 |
335
+ | dot_accuracy@5 | 0.9944 |
336
+ | dot_accuracy@10 | 1.0 |
337
+ | dot_precision@1 | 0.927 |
338
+ | dot_precision@3 | 0.3277 |
339
+ | dot_precision@5 | 0.1989 |
340
+ | dot_precision@10 | 0.1 |
341
+ | dot_recall@1 | 0.927 |
342
+ | dot_recall@3 | 0.9831 |
343
+ | dot_recall@5 | 0.9944 |
344
+ | dot_recall@10 | 1.0 |
345
+ | dot_ndcg@10 | 0.9683 |
346
+ | dot_mrr@10 | 0.9576 |
347
+ | dot_map@100 | 0.9576 |
348
+
349
+ <!--
350
+ ## Bias, Risks and Limitations
351
+
352
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
353
+ -->
354
+
355
+ <!--
356
+ ### Recommendations
357
+
358
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
359
+ -->
360
+
361
+ ## Training Details
362
+
363
+ ### Training Dataset
364
+
365
+ #### train
366
+
367
+ * Dataset: train
368
+ * Size: 178 training samples
369
+ * Columns: <code>anchor</code> and <code>positive</code>
370
+ * Approximate statistics based on the first 178 samples:
371
+ | | anchor | positive |
372
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
373
+ | type | string | string |
374
+ | details | <ul><li>min: 10 tokens</li><li>mean: 23.63 tokens</li><li>max: 46 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 66.67 tokens</li><li>max: 313 tokens</li></ul> |
375
+ * Samples:
376
+ | anchor | positive |
377
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
378
+ | <code>What is the publication date of the NVIDIA Corporation Annual Report 2024?</code> | <code> The publication date of the NVIDIA Corporation Annual Report 2024 is February 21st, 2024.</code> |
379
+ | <code>What is the filing date of the 10-K report for NVIDIA Corporation in 2004?</code> | <code> The filing dates of the 10-K reports for NVIDIA Corporation in 2004 are May 20th, March 29th, and April 25th.</code> |
380
+ | <code>What is the purpose of the section of the filing that requires the registrant to indicate whether it has submitted electronically every Interactive Data File required to be submitted pursuant to Rule 405 of Regulation S-T?</code> | <code> The purpose of this section is to require the registrant to disclose whether it has submitted all required Interactive Data Files electronically, as mandated by Rule 405 of Regulation S-T, during the preceding 12 months or for the shorter period that the registrant was required to submit such files.</code> |
381
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
382
+ ```json
383
+ {
384
+ "scale": 20.0,
385
+ "similarity_fct": "cos_sim"
386
+ }
387
+ ```
388
+
389
+ ### Training Hyperparameters
390
+ #### Non-Default Hyperparameters
391
+
392
+ - `eval_strategy`: epoch
393
+ - `per_device_train_batch_size`: 4
394
+ - `per_device_eval_batch_size`: 16
395
+ - `gradient_accumulation_steps`: 16
396
+ - `learning_rate`: 2e-05
397
+ - `num_train_epochs`: 25
398
+ - `lr_scheduler_type`: cosine
399
+ - `warmup_ratio`: 0.1
400
+ - `load_best_model_at_end`: True
401
+ - `optim`: adamw_torch_fused
402
+ - `batch_sampler`: no_duplicates
403
+
404
+ #### All Hyperparameters
405
+ <details><summary>Click to expand</summary>
406
+
407
+ - `overwrite_output_dir`: False
408
+ - `do_predict`: False
409
+ - `eval_strategy`: epoch
410
+ - `prediction_loss_only`: True
411
+ - `per_device_train_batch_size`: 4
412
+ - `per_device_eval_batch_size`: 16
413
+ - `per_gpu_train_batch_size`: None
414
+ - `per_gpu_eval_batch_size`: None
415
+ - `gradient_accumulation_steps`: 16
416
+ - `eval_accumulation_steps`: None
417
+ - `torch_empty_cache_steps`: None
418
+ - `learning_rate`: 2e-05
419
+ - `weight_decay`: 0.0
420
+ - `adam_beta1`: 0.9
421
+ - `adam_beta2`: 0.999
422
+ - `adam_epsilon`: 1e-08
423
+ - `max_grad_norm`: 1.0
424
+ - `num_train_epochs`: 25
425
+ - `max_steps`: -1
426
+ - `lr_scheduler_type`: cosine
427
+ - `lr_scheduler_kwargs`: {}
428
+ - `warmup_ratio`: 0.1
429
+ - `warmup_steps`: 0
430
+ - `log_level`: passive
431
+ - `log_level_replica`: warning
432
+ - `log_on_each_node`: True
433
+ - `logging_nan_inf_filter`: True
434
+ - `save_safetensors`: True
435
+ - `save_on_each_node`: False
436
+ - `save_only_model`: False
437
+ - `restore_callback_states_from_checkpoint`: False
438
+ - `no_cuda`: False
439
+ - `use_cpu`: False
440
+ - `use_mps_device`: False
441
+ - `seed`: 42
442
+ - `data_seed`: None
443
+ - `jit_mode_eval`: False
444
+ - `use_ipex`: False
445
+ - `bf16`: False
446
+ - `fp16`: False
447
+ - `fp16_opt_level`: O1
448
+ - `half_precision_backend`: auto
449
+ - `bf16_full_eval`: False
450
+ - `fp16_full_eval`: False
451
+ - `tf32`: None
452
+ - `local_rank`: 0
453
+ - `ddp_backend`: None
454
+ - `tpu_num_cores`: None
455
+ - `tpu_metrics_debug`: False
456
+ - `debug`: []
457
+ - `dataloader_drop_last`: False
458
+ - `dataloader_num_workers`: 0
459
+ - `dataloader_prefetch_factor`: None
460
+ - `past_index`: -1
461
+ - `disable_tqdm`: False
462
+ - `remove_unused_columns`: True
463
+ - `label_names`: None
464
+ - `load_best_model_at_end`: True
465
+ - `ignore_data_skip`: False
466
+ - `fsdp`: []
467
+ - `fsdp_min_num_params`: 0
468
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
469
+ - `fsdp_transformer_layer_cls_to_wrap`: None
470
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
471
+ - `deepspeed`: None
472
+ - `label_smoothing_factor`: 0.0
473
+ - `optim`: adamw_torch_fused
474
+ - `optim_args`: None
475
+ - `adafactor`: False
476
+ - `group_by_length`: False
477
+ - `length_column_name`: length
478
+ - `ddp_find_unused_parameters`: None
479
+ - `ddp_bucket_cap_mb`: None
480
+ - `ddp_broadcast_buffers`: False
481
+ - `dataloader_pin_memory`: True
482
+ - `dataloader_persistent_workers`: False
483
+ - `skip_memory_metrics`: True
484
+ - `use_legacy_prediction_loop`: False
485
+ - `push_to_hub`: False
486
+ - `resume_from_checkpoint`: None
487
+ - `hub_model_id`: None
488
+ - `hub_strategy`: every_save
489
+ - `hub_private_repo`: False
490
+ - `hub_always_push`: False
491
+ - `gradient_checkpointing`: False
492
+ - `gradient_checkpointing_kwargs`: None
493
+ - `include_inputs_for_metrics`: False
494
+ - `eval_do_concat_batches`: True
495
+ - `fp16_backend`: auto
496
+ - `push_to_hub_model_id`: None
497
+ - `push_to_hub_organization`: None
498
+ - `mp_parameters`:
499
+ - `auto_find_batch_size`: False
500
+ - `full_determinism`: False
501
+ - `torchdynamo`: None
502
+ - `ray_scope`: last
503
+ - `ddp_timeout`: 1800
504
+ - `torch_compile`: False
505
+ - `torch_compile_backend`: None
506
+ - `torch_compile_mode`: None
507
+ - `dispatch_batches`: None
508
+ - `split_batches`: None
509
+ - `include_tokens_per_second`: False
510
+ - `include_num_input_tokens_seen`: False
511
+ - `neftune_noise_alpha`: None
512
+ - `optim_target_modules`: None
513
+ - `batch_eval_metrics`: False
514
+ - `eval_on_start`: False
515
+ - `use_liger_kernel`: False
516
+ - `eval_use_gather_object`: False
517
+ - `batch_sampler`: no_duplicates
518
+ - `multi_dataset_batch_sampler`: proportional
519
+
520
+ </details>
521
+
522
+ ### Training Logs
523
+ | Epoch | Step | Training Loss | bge-base-en_cosine_map@100 |
524
+ |:-----------:|:------:|:-------------:|:--------------------------:|
525
+ | 0 | 0 | - | 0.8574 |
526
+ | 0.7111 | 2 | - | 0.8591 |
527
+ | 1.7778 | 5 | - | 0.8757 |
528
+ | 2.8444 | 8 | - | 0.9012 |
529
+ | 3.5556 | 10 | 0.2885 | - |
530
+ | 3.9111 | 11 | - | 0.9134 |
531
+ | 4.9778 | 14 | - | 0.9277 |
532
+ | 5.6889 | 16 | - | 0.9391 |
533
+ | 6.7556 | 19 | - | 0.9463 |
534
+ | 7.1111 | 20 | 0.0644 | - |
535
+ | 7.8222 | 22 | - | 0.9506 |
536
+ | 8.8889 | 25 | - | 0.9515 |
537
+ | 9.9556 | 28 | - | 0.9555 |
538
+ | 10.6667 | 30 | 0.0333 | 0.9560 |
539
+ | 11.7333 | 33 | - | 0.9551 |
540
+ | 12.8 | 36 | - | 0.9569 |
541
+ | **13.8667** | **39** | **-** | **0.9579** |
542
+ | 14.2222 | 40 | 0.0157 | - |
543
+ | 14.9333 | 42 | - | 0.9576 |
544
+ | 16.0 | 45 | - | 0.9576 |
545
+ | 16.7111 | 47 | - | 0.9576 |
546
+ | 17.7778 | 50 | 0.0124 | 0.9576 |
547
+
548
+ * The bold row denotes the saved checkpoint.
549
+
550
+ ### Framework Versions
551
+ - Python: 3.10.12
552
+ - Sentence Transformers: 3.1.1
553
+ - Transformers: 4.45.2
554
+ - PyTorch: 2.5.1+cu121
555
+ - Accelerate: 1.2.1
556
+ - Datasets: 3.1.0
557
+ - Tokenizers: 0.20.3
558
+
559
+ ## Citation
560
+
561
+ ### BibTeX
562
+
563
+ #### Sentence Transformers
564
+ ```bibtex
565
+ @inproceedings{reimers-2019-sentence-bert,
566
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
567
+ author = "Reimers, Nils and Gurevych, Iryna",
568
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
569
+ month = "11",
570
+ year = "2019",
571
+ publisher = "Association for Computational Linguistics",
572
+ url = "https://arxiv.org/abs/1908.10084",
573
+ }
574
+ ```
575
+
576
+ #### MultipleNegativesRankingLoss
577
+ ```bibtex
578
+ @misc{henderson2017efficient,
579
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
580
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
581
+ year={2017},
582
+ eprint={1705.00652},
583
+ archivePrefix={arXiv},
584
+ primaryClass={cs.CL}
585
+ }
586
+ ```
587
+
588
+ <!--
589
+ ## Glossary
590
+
591
+ *Clearly define terms in order to be accessible across audiences.*
592
+ -->
593
+
594
+ <!--
595
+ ## Model Card Authors
596
+
597
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
598
+ -->
599
+
600
+ <!--
601
+ ## Model Card Contact
602
+
603
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
604
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/content/drive/MyDrive/model/finetune_ret",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.45.2",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.1",
4
+ "transformers": "4.45.2",
5
+ "pytorch": "2.5.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc2bfb7461c38daf91066da15ee19a54cda134200712295eb57f8ff48072d373
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]"
64
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff