cngcv commited on
Commit
98ab0c2
1 Parent(s): b910d35

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,799 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ datasets: []
4
+ language:
5
+ - en
6
+ library_name: sentence-transformers
7
+ license: apache-2.0
8
+ metrics:
9
+ - cosine_accuracy@1
10
+ - cosine_accuracy@3
11
+ - cosine_accuracy@5
12
+ - cosine_accuracy@10
13
+ - cosine_precision@1
14
+ - cosine_precision@3
15
+ - cosine_precision@5
16
+ - cosine_precision@10
17
+ - cosine_recall@1
18
+ - cosine_recall@3
19
+ - cosine_recall@5
20
+ - cosine_recall@10
21
+ - cosine_ndcg@10
22
+ - cosine_mrr@10
23
+ - cosine_map@100
24
+ pipeline_tag: sentence-similarity
25
+ tags:
26
+ - sentence-transformers
27
+ - sentence-similarity
28
+ - feature-extraction
29
+ - generated_from_trainer
30
+ - dataset_size:196
31
+ - loss:MatryoshkaLoss
32
+ - loss:MultipleNegativesRankingLoss
33
+ widget:
34
+ - source_sentence: The text refers to the preparation of a pre-trained model for data
35
+ set usage, which is a crucial step in machine learning projects. This suggests
36
+ that the project involves using a model that has already been trained on a dataset,
37
+ which can then be fine-tuned or used directly for specific tasks, potentially
38
+ saving time and computational resources.
39
+ sentences:
40
+ - What is the significance of preparing a pre-trained model in the data set for
41
+ the process described in the text?
42
+ - What is the purpose of the document?
43
+ - What are the developer AI developer's experiences in AI development and research?
44
+ - source_sentence: The project manager has a degree from Vietnam National University
45
+ and has completed a Google TensorFlow certification.
46
+ sentences:
47
+ - How often are the training, evaluation, and re-training steps repeated in the
48
+ text?
49
+ - What is the project manager's educational background?
50
+ - What information should be shared via email when final product delivery is completed?
51
+ - source_sentence: The text mentions that Docker for the deployment of a high NT Q
52
+ trained model was built between July 18 and July 19, 2024.
53
+ sentences:
54
+ - What is the role of "データベースベクトルとセマンティクス検索モジュール"?
55
+ - When was the Docker for the deployment of a high NT Q trained model built?
56
+ - What is the significance of Level 3 in the escalation process described in the
57
+ text?
58
+ - source_sentence: The text spans from September 4th to October 16th, covering a total
59
+ of 33 days.
60
+ sentences:
61
+ - How many days are listed in the given text?
62
+ - How does the system support the current system and plan for future feature development?
63
+ - What are the two distinct products offered by NT Q?
64
+ - source_sentence: After text generation, the process involves providing test data
65
+ to NT Q, which then undergoes article correction, including dealing with fragmented
66
+ articles and errors.
67
+ sentences:
68
+ - What is the process for providing test data to NT Q after text generation?
69
+ - When is the deadline for combining the API for the setting function?
70
+ - What is the significance of the dates in the text?
71
+ model-index:
72
+ - name: BGE base Financial Matryoshka
73
+ results:
74
+ - task:
75
+ type: information-retrieval
76
+ name: Information Retrieval
77
+ dataset:
78
+ name: dim 768
79
+ type: dim_768
80
+ metrics:
81
+ - type: cosine_accuracy@1
82
+ value: 0.7755102040816326
83
+ name: Cosine Accuracy@1
84
+ - type: cosine_accuracy@3
85
+ value: 0.8775510204081632
86
+ name: Cosine Accuracy@3
87
+ - type: cosine_accuracy@5
88
+ value: 0.9591836734693877
89
+ name: Cosine Accuracy@5
90
+ - type: cosine_accuracy@10
91
+ value: 0.9795918367346939
92
+ name: Cosine Accuracy@10
93
+ - type: cosine_precision@1
94
+ value: 0.7755102040816326
95
+ name: Cosine Precision@1
96
+ - type: cosine_precision@3
97
+ value: 0.2925170068027211
98
+ name: Cosine Precision@3
99
+ - type: cosine_precision@5
100
+ value: 0.19183673469387752
101
+ name: Cosine Precision@5
102
+ - type: cosine_precision@10
103
+ value: 0.09795918367346937
104
+ name: Cosine Precision@10
105
+ - type: cosine_recall@1
106
+ value: 0.7755102040816326
107
+ name: Cosine Recall@1
108
+ - type: cosine_recall@3
109
+ value: 0.8775510204081632
110
+ name: Cosine Recall@3
111
+ - type: cosine_recall@5
112
+ value: 0.9591836734693877
113
+ name: Cosine Recall@5
114
+ - type: cosine_recall@10
115
+ value: 0.9795918367346939
116
+ name: Cosine Recall@10
117
+ - type: cosine_ndcg@10
118
+ value: 0.8776251324776435
119
+ name: Cosine Ndcg@10
120
+ - type: cosine_mrr@10
121
+ value: 0.8447845804988664
122
+ name: Cosine Mrr@10
123
+ - type: cosine_map@100
124
+ value: 0.846354439211582
125
+ name: Cosine Map@100
126
+ - task:
127
+ type: information-retrieval
128
+ name: Information Retrieval
129
+ dataset:
130
+ name: dim 512
131
+ type: dim_512
132
+ metrics:
133
+ - type: cosine_accuracy@1
134
+ value: 0.7959183673469388
135
+ name: Cosine Accuracy@1
136
+ - type: cosine_accuracy@3
137
+ value: 0.8979591836734694
138
+ name: Cosine Accuracy@3
139
+ - type: cosine_accuracy@5
140
+ value: 0.9591836734693877
141
+ name: Cosine Accuracy@5
142
+ - type: cosine_accuracy@10
143
+ value: 0.9795918367346939
144
+ name: Cosine Accuracy@10
145
+ - type: cosine_precision@1
146
+ value: 0.7959183673469388
147
+ name: Cosine Precision@1
148
+ - type: cosine_precision@3
149
+ value: 0.29931972789115646
150
+ name: Cosine Precision@3
151
+ - type: cosine_precision@5
152
+ value: 0.19183673469387752
153
+ name: Cosine Precision@5
154
+ - type: cosine_precision@10
155
+ value: 0.09795918367346937
156
+ name: Cosine Precision@10
157
+ - type: cosine_recall@1
158
+ value: 0.7959183673469388
159
+ name: Cosine Recall@1
160
+ - type: cosine_recall@3
161
+ value: 0.8979591836734694
162
+ name: Cosine Recall@3
163
+ - type: cosine_recall@5
164
+ value: 0.9591836734693877
165
+ name: Cosine Recall@5
166
+ - type: cosine_recall@10
167
+ value: 0.9795918367346939
168
+ name: Cosine Recall@10
169
+ - type: cosine_ndcg@10
170
+ value: 0.884559158446073
171
+ name: Cosine Ndcg@10
172
+ - type: cosine_mrr@10
173
+ value: 0.8539358600583091
174
+ name: Cosine Mrr@10
175
+ - type: cosine_map@100
176
+ value: 0.8551363402503859
177
+ name: Cosine Map@100
178
+ - task:
179
+ type: information-retrieval
180
+ name: Information Retrieval
181
+ dataset:
182
+ name: dim 256
183
+ type: dim_256
184
+ metrics:
185
+ - type: cosine_accuracy@1
186
+ value: 0.6938775510204082
187
+ name: Cosine Accuracy@1
188
+ - type: cosine_accuracy@3
189
+ value: 0.9183673469387755
190
+ name: Cosine Accuracy@3
191
+ - type: cosine_accuracy@5
192
+ value: 0.9591836734693877
193
+ name: Cosine Accuracy@5
194
+ - type: cosine_accuracy@10
195
+ value: 0.9591836734693877
196
+ name: Cosine Accuracy@10
197
+ - type: cosine_precision@1
198
+ value: 0.6938775510204082
199
+ name: Cosine Precision@1
200
+ - type: cosine_precision@3
201
+ value: 0.3061224489795918
202
+ name: Cosine Precision@3
203
+ - type: cosine_precision@5
204
+ value: 0.19183673469387752
205
+ name: Cosine Precision@5
206
+ - type: cosine_precision@10
207
+ value: 0.09591836734693876
208
+ name: Cosine Precision@10
209
+ - type: cosine_recall@1
210
+ value: 0.6938775510204082
211
+ name: Cosine Recall@1
212
+ - type: cosine_recall@3
213
+ value: 0.9183673469387755
214
+ name: Cosine Recall@3
215
+ - type: cosine_recall@5
216
+ value: 0.9591836734693877
217
+ name: Cosine Recall@5
218
+ - type: cosine_recall@10
219
+ value: 0.9591836734693877
220
+ name: Cosine Recall@10
221
+ - type: cosine_ndcg@10
222
+ value: 0.8397332987260313
223
+ name: Cosine Ndcg@10
224
+ - type: cosine_mrr@10
225
+ value: 0.7993197278911565
226
+ name: Cosine Mrr@10
227
+ - type: cosine_map@100
228
+ value: 0.8016520894071916
229
+ name: Cosine Map@100
230
+ - task:
231
+ type: information-retrieval
232
+ name: Information Retrieval
233
+ dataset:
234
+ name: dim 128
235
+ type: dim_128
236
+ metrics:
237
+ - type: cosine_accuracy@1
238
+ value: 0.6938775510204082
239
+ name: Cosine Accuracy@1
240
+ - type: cosine_accuracy@3
241
+ value: 0.9183673469387755
242
+ name: Cosine Accuracy@3
243
+ - type: cosine_accuracy@5
244
+ value: 0.9183673469387755
245
+ name: Cosine Accuracy@5
246
+ - type: cosine_accuracy@10
247
+ value: 0.9183673469387755
248
+ name: Cosine Accuracy@10
249
+ - type: cosine_precision@1
250
+ value: 0.6938775510204082
251
+ name: Cosine Precision@1
252
+ - type: cosine_precision@3
253
+ value: 0.3061224489795918
254
+ name: Cosine Precision@3
255
+ - type: cosine_precision@5
256
+ value: 0.1836734693877551
257
+ name: Cosine Precision@5
258
+ - type: cosine_precision@10
259
+ value: 0.09183673469387756
260
+ name: Cosine Precision@10
261
+ - type: cosine_recall@1
262
+ value: 0.6938775510204082
263
+ name: Cosine Recall@1
264
+ - type: cosine_recall@3
265
+ value: 0.9183673469387755
266
+ name: Cosine Recall@3
267
+ - type: cosine_recall@5
268
+ value: 0.9183673469387755
269
+ name: Cosine Recall@5
270
+ - type: cosine_recall@10
271
+ value: 0.9183673469387755
272
+ name: Cosine Recall@10
273
+ - type: cosine_ndcg@10
274
+ value: 0.8168105921282822
275
+ name: Cosine Ndcg@10
276
+ - type: cosine_mrr@10
277
+ value: 0.7823129251700681
278
+ name: Cosine Mrr@10
279
+ - type: cosine_map@100
280
+ value: 0.7865583396195641
281
+ name: Cosine Map@100
282
+ - task:
283
+ type: information-retrieval
284
+ name: Information Retrieval
285
+ dataset:
286
+ name: dim 64
287
+ type: dim_64
288
+ metrics:
289
+ - type: cosine_accuracy@1
290
+ value: 0.5918367346938775
291
+ name: Cosine Accuracy@1
292
+ - type: cosine_accuracy@3
293
+ value: 0.7959183673469388
294
+ name: Cosine Accuracy@3
295
+ - type: cosine_accuracy@5
296
+ value: 0.8163265306122449
297
+ name: Cosine Accuracy@5
298
+ - type: cosine_accuracy@10
299
+ value: 0.9183673469387755
300
+ name: Cosine Accuracy@10
301
+ - type: cosine_precision@1
302
+ value: 0.5918367346938775
303
+ name: Cosine Precision@1
304
+ - type: cosine_precision@3
305
+ value: 0.26530612244897955
306
+ name: Cosine Precision@3
307
+ - type: cosine_precision@5
308
+ value: 0.16326530612244897
309
+ name: Cosine Precision@5
310
+ - type: cosine_precision@10
311
+ value: 0.09183673469387756
312
+ name: Cosine Precision@10
313
+ - type: cosine_recall@1
314
+ value: 0.5918367346938775
315
+ name: Cosine Recall@1
316
+ - type: cosine_recall@3
317
+ value: 0.7959183673469388
318
+ name: Cosine Recall@3
319
+ - type: cosine_recall@5
320
+ value: 0.8163265306122449
321
+ name: Cosine Recall@5
322
+ - type: cosine_recall@10
323
+ value: 0.9183673469387755
324
+ name: Cosine Recall@10
325
+ - type: cosine_ndcg@10
326
+ value: 0.7471061057082727
327
+ name: Cosine Ndcg@10
328
+ - type: cosine_mrr@10
329
+ value: 0.6929057337220603
330
+ name: Cosine Mrr@10
331
+ - type: cosine_map@100
332
+ value: 0.6978234213668709
333
+ name: Cosine Map@100
334
+ ---
335
+
336
+ # BGE base Financial Matryoshka
337
+
338
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
339
+
340
+ ## Model Details
341
+
342
+ ### Model Description
343
+ - **Model Type:** Sentence Transformer
344
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
345
+ - **Maximum Sequence Length:** 512 tokens
346
+ - **Output Dimensionality:** 768 tokens
347
+ - **Similarity Function:** Cosine Similarity
348
+ <!-- - **Training Dataset:** Unknown -->
349
+ - **Language:** en
350
+ - **License:** apache-2.0
351
+
352
+ ### Model Sources
353
+
354
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
355
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
356
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
357
+
358
+ ### Full Model Architecture
359
+
360
+ ```
361
+ SentenceTransformer(
362
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
363
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
364
+ (2): Normalize()
365
+ )
366
+ ```
367
+
368
+ ## Usage
369
+
370
+ ### Direct Usage (Sentence Transformers)
371
+
372
+ First install the Sentence Transformers library:
373
+
374
+ ```bash
375
+ pip install -U sentence-transformers
376
+ ```
377
+
378
+ Then you can load this model and run inference.
379
+ ```python
380
+ from sentence_transformers import SentenceTransformer
381
+
382
+ # Download from the 🤗 Hub
383
+ model = SentenceTransformer("cngcv/bge-base-financial-matryoshka")
384
+ # Run inference
385
+ sentences = [
386
+ 'After text generation, the process involves providing test data to NT Q, which then undergoes article correction, including dealing with fragmented articles and errors.',
387
+ 'What is the process for providing test data to NT Q after text generation?',
388
+ 'What is the significance of the dates in the text?',
389
+ ]
390
+ embeddings = model.encode(sentences)
391
+ print(embeddings.shape)
392
+ # [3, 768]
393
+
394
+ # Get the similarity scores for the embeddings
395
+ similarities = model.similarity(embeddings, embeddings)
396
+ print(similarities.shape)
397
+ # [3, 3]
398
+ ```
399
+
400
+ <!--
401
+ ### Direct Usage (Transformers)
402
+
403
+ <details><summary>Click to see the direct usage in Transformers</summary>
404
+
405
+ </details>
406
+ -->
407
+
408
+ <!--
409
+ ### Downstream Usage (Sentence Transformers)
410
+
411
+ You can finetune this model on your own dataset.
412
+
413
+ <details><summary>Click to expand</summary>
414
+
415
+ </details>
416
+ -->
417
+
418
+ <!--
419
+ ### Out-of-Scope Use
420
+
421
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
422
+ -->
423
+
424
+ ## Evaluation
425
+
426
+ ### Metrics
427
+
428
+ #### Information Retrieval
429
+ * Dataset: `dim_768`
430
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
431
+
432
+ | Metric | Value |
433
+ |:--------------------|:-----------|
434
+ | cosine_accuracy@1 | 0.7755 |
435
+ | cosine_accuracy@3 | 0.8776 |
436
+ | cosine_accuracy@5 | 0.9592 |
437
+ | cosine_accuracy@10 | 0.9796 |
438
+ | cosine_precision@1 | 0.7755 |
439
+ | cosine_precision@3 | 0.2925 |
440
+ | cosine_precision@5 | 0.1918 |
441
+ | cosine_precision@10 | 0.098 |
442
+ | cosine_recall@1 | 0.7755 |
443
+ | cosine_recall@3 | 0.8776 |
444
+ | cosine_recall@5 | 0.9592 |
445
+ | cosine_recall@10 | 0.9796 |
446
+ | cosine_ndcg@10 | 0.8776 |
447
+ | cosine_mrr@10 | 0.8448 |
448
+ | **cosine_map@100** | **0.8464** |
449
+
450
+ #### Information Retrieval
451
+ * Dataset: `dim_512`
452
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
453
+
454
+ | Metric | Value |
455
+ |:--------------------|:-----------|
456
+ | cosine_accuracy@1 | 0.7959 |
457
+ | cosine_accuracy@3 | 0.898 |
458
+ | cosine_accuracy@5 | 0.9592 |
459
+ | cosine_accuracy@10 | 0.9796 |
460
+ | cosine_precision@1 | 0.7959 |
461
+ | cosine_precision@3 | 0.2993 |
462
+ | cosine_precision@5 | 0.1918 |
463
+ | cosine_precision@10 | 0.098 |
464
+ | cosine_recall@1 | 0.7959 |
465
+ | cosine_recall@3 | 0.898 |
466
+ | cosine_recall@5 | 0.9592 |
467
+ | cosine_recall@10 | 0.9796 |
468
+ | cosine_ndcg@10 | 0.8846 |
469
+ | cosine_mrr@10 | 0.8539 |
470
+ | **cosine_map@100** | **0.8551** |
471
+
472
+ #### Information Retrieval
473
+ * Dataset: `dim_256`
474
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
475
+
476
+ | Metric | Value |
477
+ |:--------------------|:-----------|
478
+ | cosine_accuracy@1 | 0.6939 |
479
+ | cosine_accuracy@3 | 0.9184 |
480
+ | cosine_accuracy@5 | 0.9592 |
481
+ | cosine_accuracy@10 | 0.9592 |
482
+ | cosine_precision@1 | 0.6939 |
483
+ | cosine_precision@3 | 0.3061 |
484
+ | cosine_precision@5 | 0.1918 |
485
+ | cosine_precision@10 | 0.0959 |
486
+ | cosine_recall@1 | 0.6939 |
487
+ | cosine_recall@3 | 0.9184 |
488
+ | cosine_recall@5 | 0.9592 |
489
+ | cosine_recall@10 | 0.9592 |
490
+ | cosine_ndcg@10 | 0.8397 |
491
+ | cosine_mrr@10 | 0.7993 |
492
+ | **cosine_map@100** | **0.8017** |
493
+
494
+ #### Information Retrieval
495
+ * Dataset: `dim_128`
496
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
497
+
498
+ | Metric | Value |
499
+ |:--------------------|:-----------|
500
+ | cosine_accuracy@1 | 0.6939 |
501
+ | cosine_accuracy@3 | 0.9184 |
502
+ | cosine_accuracy@5 | 0.9184 |
503
+ | cosine_accuracy@10 | 0.9184 |
504
+ | cosine_precision@1 | 0.6939 |
505
+ | cosine_precision@3 | 0.3061 |
506
+ | cosine_precision@5 | 0.1837 |
507
+ | cosine_precision@10 | 0.0918 |
508
+ | cosine_recall@1 | 0.6939 |
509
+ | cosine_recall@3 | 0.9184 |
510
+ | cosine_recall@5 | 0.9184 |
511
+ | cosine_recall@10 | 0.9184 |
512
+ | cosine_ndcg@10 | 0.8168 |
513
+ | cosine_mrr@10 | 0.7823 |
514
+ | **cosine_map@100** | **0.7866** |
515
+
516
+ #### Information Retrieval
517
+ * Dataset: `dim_64`
518
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
519
+
520
+ | Metric | Value |
521
+ |:--------------------|:-----------|
522
+ | cosine_accuracy@1 | 0.5918 |
523
+ | cosine_accuracy@3 | 0.7959 |
524
+ | cosine_accuracy@5 | 0.8163 |
525
+ | cosine_accuracy@10 | 0.9184 |
526
+ | cosine_precision@1 | 0.5918 |
527
+ | cosine_precision@3 | 0.2653 |
528
+ | cosine_precision@5 | 0.1633 |
529
+ | cosine_precision@10 | 0.0918 |
530
+ | cosine_recall@1 | 0.5918 |
531
+ | cosine_recall@3 | 0.7959 |
532
+ | cosine_recall@5 | 0.8163 |
533
+ | cosine_recall@10 | 0.9184 |
534
+ | cosine_ndcg@10 | 0.7471 |
535
+ | cosine_mrr@10 | 0.6929 |
536
+ | **cosine_map@100** | **0.6978** |
537
+
538
+ <!--
539
+ ## Bias, Risks and Limitations
540
+
541
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
542
+ -->
543
+
544
+ <!--
545
+ ### Recommendations
546
+
547
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
548
+ -->
549
+
550
+ ## Training Details
551
+
552
+ ### Training Dataset
553
+
554
+ #### Unnamed Dataset
555
+
556
+
557
+ * Size: 196 training samples
558
+ * Columns: <code>positive</code> and <code>anchor</code>
559
+ * Approximate statistics based on the first 1000 samples:
560
+ | | positive | anchor |
561
+ |:--------|:------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
562
+ | type | string | string |
563
+ | details | <ul><li>min: 15 tokens</li><li>mean: 46.58 tokens</li><li>max: 118 tokens</li></ul> | <ul><li>min: 10 tokens</li><li>mean: 17.25 tokens</li><li>max: 43 tokens</li></ul> |
564
+ * Samples:
565
+ | positive | anchor |
566
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
567
+ | <code>The document lists several tasks with their statuses, such as "Done", "In progress", and "To be done". These statuses indicate the current progress of each task within the project. For example, "Set up environment" and "Set up development environment" are marked as "Done", suggesting these tasks have been completed, while "Build translation data set" is marked as "In progress", indicating it is currently being worked on.</code> | <code>What is the status of the project tasks mentioned in the document?</code> |
568
+ | <code>The 'Web Application Construction' task is mentioned to be completed by NT Q, with a duration from July 17, 2023, to July 28, 2023, and is marked as 'Done' with a completion of 10 tasks.</code> | <code>What is the scope of the 'Web Application Construction' task?</code> |
569
+ | <code>"RE F" could potentially stand for "Reference File" or "Record File," indicating that this text might be part of a larger dataset or document used for reference or record-keeping purposes.</code> | <code>What is the significance of the "RE F" at the beginning of the text?</code> |
570
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
571
+ ```json
572
+ {
573
+ "loss": "MultipleNegativesRankingLoss",
574
+ "matryoshka_dims": [
575
+ 768,
576
+ 512,
577
+ 256,
578
+ 128,
579
+ 64
580
+ ],
581
+ "matryoshka_weights": [
582
+ 1,
583
+ 1,
584
+ 1,
585
+ 1,
586
+ 1
587
+ ],
588
+ "n_dims_per_step": -1
589
+ }
590
+ ```
591
+
592
+ ### Training Hyperparameters
593
+ #### Non-Default Hyperparameters
594
+
595
+ - `eval_strategy`: epoch
596
+ - `per_device_train_batch_size`: 32
597
+ - `per_device_eval_batch_size`: 16
598
+ - `gradient_accumulation_steps`: 16
599
+ - `learning_rate`: 2e-05
600
+ - `num_train_epochs`: 4
601
+ - `lr_scheduler_type`: cosine
602
+ - `warmup_ratio`: 0.1
603
+ - `tf32`: False
604
+ - `load_best_model_at_end`: True
605
+ - `optim`: adamw_torch_fused
606
+ - `batch_sampler`: no_duplicates
607
+
608
+ #### All Hyperparameters
609
+ <details><summary>Click to expand</summary>
610
+
611
+ - `overwrite_output_dir`: False
612
+ - `do_predict`: False
613
+ - `eval_strategy`: epoch
614
+ - `prediction_loss_only`: True
615
+ - `per_device_train_batch_size`: 32
616
+ - `per_device_eval_batch_size`: 16
617
+ - `per_gpu_train_batch_size`: None
618
+ - `per_gpu_eval_batch_size`: None
619
+ - `gradient_accumulation_steps`: 16
620
+ - `eval_accumulation_steps`: None
621
+ - `learning_rate`: 2e-05
622
+ - `weight_decay`: 0.0
623
+ - `adam_beta1`: 0.9
624
+ - `adam_beta2`: 0.999
625
+ - `adam_epsilon`: 1e-08
626
+ - `max_grad_norm`: 1.0
627
+ - `num_train_epochs`: 4
628
+ - `max_steps`: -1
629
+ - `lr_scheduler_type`: cosine
630
+ - `lr_scheduler_kwargs`: {}
631
+ - `warmup_ratio`: 0.1
632
+ - `warmup_steps`: 0
633
+ - `log_level`: passive
634
+ - `log_level_replica`: warning
635
+ - `log_on_each_node`: True
636
+ - `logging_nan_inf_filter`: True
637
+ - `save_safetensors`: True
638
+ - `save_on_each_node`: False
639
+ - `save_only_model`: False
640
+ - `restore_callback_states_from_checkpoint`: False
641
+ - `no_cuda`: False
642
+ - `use_cpu`: False
643
+ - `use_mps_device`: False
644
+ - `seed`: 42
645
+ - `data_seed`: None
646
+ - `jit_mode_eval`: False
647
+ - `use_ipex`: False
648
+ - `bf16`: False
649
+ - `fp16`: False
650
+ - `fp16_opt_level`: O1
651
+ - `half_precision_backend`: auto
652
+ - `bf16_full_eval`: False
653
+ - `fp16_full_eval`: False
654
+ - `tf32`: False
655
+ - `local_rank`: 0
656
+ - `ddp_backend`: None
657
+ - `tpu_num_cores`: None
658
+ - `tpu_metrics_debug`: False
659
+ - `debug`: []
660
+ - `dataloader_drop_last`: False
661
+ - `dataloader_num_workers`: 0
662
+ - `dataloader_prefetch_factor`: None
663
+ - `past_index`: -1
664
+ - `disable_tqdm`: False
665
+ - `remove_unused_columns`: True
666
+ - `label_names`: None
667
+ - `load_best_model_at_end`: True
668
+ - `ignore_data_skip`: False
669
+ - `fsdp`: []
670
+ - `fsdp_min_num_params`: 0
671
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
672
+ - `fsdp_transformer_layer_cls_to_wrap`: None
673
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
674
+ - `deepspeed`: None
675
+ - `label_smoothing_factor`: 0.0
676
+ - `optim`: adamw_torch_fused
677
+ - `optim_args`: None
678
+ - `adafactor`: False
679
+ - `group_by_length`: False
680
+ - `length_column_name`: length
681
+ - `ddp_find_unused_parameters`: None
682
+ - `ddp_bucket_cap_mb`: None
683
+ - `ddp_broadcast_buffers`: False
684
+ - `dataloader_pin_memory`: True
685
+ - `dataloader_persistent_workers`: False
686
+ - `skip_memory_metrics`: True
687
+ - `use_legacy_prediction_loop`: False
688
+ - `push_to_hub`: False
689
+ - `resume_from_checkpoint`: None
690
+ - `hub_model_id`: None
691
+ - `hub_strategy`: every_save
692
+ - `hub_private_repo`: False
693
+ - `hub_always_push`: False
694
+ - `gradient_checkpointing`: False
695
+ - `gradient_checkpointing_kwargs`: None
696
+ - `include_inputs_for_metrics`: False
697
+ - `eval_do_concat_batches`: True
698
+ - `fp16_backend`: auto
699
+ - `push_to_hub_model_id`: None
700
+ - `push_to_hub_organization`: None
701
+ - `mp_parameters`:
702
+ - `auto_find_batch_size`: False
703
+ - `full_determinism`: False
704
+ - `torchdynamo`: None
705
+ - `ray_scope`: last
706
+ - `ddp_timeout`: 1800
707
+ - `torch_compile`: False
708
+ - `torch_compile_backend`: None
709
+ - `torch_compile_mode`: None
710
+ - `dispatch_batches`: None
711
+ - `split_batches`: None
712
+ - `include_tokens_per_second`: False
713
+ - `include_num_input_tokens_seen`: False
714
+ - `neftune_noise_alpha`: None
715
+ - `optim_target_modules`: None
716
+ - `batch_eval_metrics`: False
717
+ - `eval_on_start`: False
718
+ - `batch_sampler`: no_duplicates
719
+ - `multi_dataset_batch_sampler`: proportional
720
+
721
+ </details>
722
+
723
+ ### Training Logs
724
+ | Epoch | Step | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
725
+ |:-------:|:-----:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
726
+ | 1.0 | 1 | 0.6908 | 0.7097 | 0.8111 | 0.6240 | 0.8011 |
727
+ | 2.0 | 2 | 0.7292 | 0.7692 | 0.8177 | 0.6634 | 0.8162 |
728
+ | 3.0 | 3 | 0.7555 | 0.8014 | 0.8541 | 0.6992 | 0.8451 |
729
+ | **4.0** | **4** | **0.7866** | **0.8017** | **0.8551** | **0.6978** | **0.8464** |
730
+
731
+ * The bold row denotes the saved checkpoint.
732
+
733
+ ### Framework Versions
734
+ - Python: 3.10.13
735
+ - Sentence Transformers: 3.0.1
736
+ - Transformers: 4.42.3
737
+ - PyTorch: 2.1.2
738
+ - Accelerate: 0.32.1
739
+ - Datasets: 2.20.0
740
+ - Tokenizers: 0.19.1
741
+
742
+ ## Citation
743
+
744
+ ### BibTeX
745
+
746
+ #### Sentence Transformers
747
+ ```bibtex
748
+ @inproceedings{reimers-2019-sentence-bert,
749
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
750
+ author = "Reimers, Nils and Gurevych, Iryna",
751
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
752
+ month = "11",
753
+ year = "2019",
754
+ publisher = "Association for Computational Linguistics",
755
+ url = "https://arxiv.org/abs/1908.10084",
756
+ }
757
+ ```
758
+
759
+ #### MatryoshkaLoss
760
+ ```bibtex
761
+ @misc{kusupati2024matryoshka,
762
+ title={Matryoshka Representation Learning},
763
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
764
+ year={2024},
765
+ eprint={2205.13147},
766
+ archivePrefix={arXiv},
767
+ primaryClass={cs.LG}
768
+ }
769
+ ```
770
+
771
+ #### MultipleNegativesRankingLoss
772
+ ```bibtex
773
+ @misc{henderson2017efficient,
774
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
775
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
776
+ year={2017},
777
+ eprint={1705.00652},
778
+ archivePrefix={arXiv},
779
+ primaryClass={cs.CL}
780
+ }
781
+ ```
782
+
783
+ <!--
784
+ ## Glossary
785
+
786
+ *Clearly define terms in order to be accessible across audiences.*
787
+ -->
788
+
789
+ <!--
790
+ ## Model Card Authors
791
+
792
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
793
+ -->
794
+
795
+ <!--
796
+ ## Model Card Contact
797
+
798
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
799
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.42.3",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.42.3",
5
+ "pytorch": "2.1.2"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d453a59f4a0c266f4212d3ebf209fe24ab32875765f79f071b05b617ef1f3ac3
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff