sachindatasociety commited on
Commit
8989465
1 Parent(s): 35822e1

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,432 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ library_name: sentence-transformers
4
+ pipeline_tag: sentence-similarity
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - generated_from_trainer
10
+ - dataset_size:183
11
+ - loss:MultipleNegativesRankingLoss
12
+ widget:
13
+ - source_sentence: Introduction to Network Protocols
14
+ sentences:
15
+ - 'Introduction to Network Protocols A course that builds foundational knowledge
16
+ of network protocols essentially covering emails and other internet protocols
17
+ Course language: TBD Prerequisite course required: Introduction to Managing Servers
18
+ Professionals who would like to get foundational knowledge of basic network protocols'
19
+ - 'Course language: TBD'
20
+ - 'Prerequisite course required: Introduction to Managing Servers'
21
+ - A course that builds foundational knowledge of network protocols essentially covering
22
+ emails and other internet protocols
23
+ - Professionals who would like to get foundational knowledge of basic network protocols
24
+ - source_sentence: Optimizing Ensemble Methods
25
+ sentences:
26
+ - 'Course language: Python'
27
+ - 'Prerequisite course required: Ensemble Methods'
28
+ - This course covers advanced topics in optimizing ensemble learning methods – specifically
29
+ random forest and gradient boosting. Students will learn to implement base models
30
+ and perform hyperparameter tuning to enhance the performance of models.
31
+ - Professionals experience in ensemble methods and who want to enhance their skill
32
+ set in advanced Python classification techniques.
33
+ - 'Optimizing Ensemble Methods This course covers advanced topics in optimizing
34
+ ensemble learning methods – specifically random forest and gradient boosting.
35
+ Students will learn to implement base models and perform hyperparameter tuning
36
+ to enhance the performance of models. Course language: Python Prerequisite course
37
+ required: Ensemble Methods Professionals experience in ensemble methods and who
38
+ want to enhance their skill set in advanced Python classification techniques.'
39
+ - source_sentence: Autoencoders
40
+ sentences:
41
+ - Professionals some Python experience who would like to expand their skillset to
42
+ more advanced machine learning algorithms for image processing and computer vision.
43
+ - 'Prerequisite course required: Convolutional Neural Networks (CNN) for Image Recognition'
44
+ - 'Course language: Python'
45
+ - 'Autoencoders This course takes students through a journey into the world od autoencoders
46
+ - a set of powerful deep learning models that have a special place in the world
47
+ of image analysis. By the end of this course students will be able to navigate
48
+ through the application space of autoencoders and implement autoencoders to perform
49
+ tasks such as image denoising and more. Course language: Python Prerequisite course
50
+ required: Convolutional Neural Networks (CNN) for Image Recognition Professionals
51
+ some Python experience who would like to expand their skillset to more advanced
52
+ machine learning algorithms for image processing and computer vision.'
53
+ - This course takes students through a journey into the world od autoencoders -
54
+ a set of powerful deep learning models that have a special place in the world
55
+ of image analysis. By the end of this course students will be able to navigate
56
+ through the application space of autoencoders and implement autoencoders to perform
57
+ tasks such as image denoising and more.
58
+ - source_sentence: Authentication Python
59
+ sentences:
60
+ - 'Prerequisite course required: Basic GraphQL: Python'
61
+ - 'Authentication Python An introduction to Authentication concepts and how it can
62
+ be implemented using Python. Course language: Python Prerequisite course required:
63
+ Basic GraphQL: Python Professionals who would like to learn the core concepts
64
+ of authentication using Python.'
65
+ - An introduction to Authentication concepts and how it can be implemented using
66
+ Python.
67
+ - 'Course language: Python'
68
+ - Professionals who would like to learn the core concepts of authentication using
69
+ Python.
70
+ - source_sentence: Clustering in NLP
71
+ sentences:
72
+ - 'Clustering in NLP This course covers the clustering concepts of natural language
73
+ processing, equipping learners with the ability to cluster text data into groups
74
+ and topics by finding similarities between different documents. Course language:
75
+ Python Prerequisite course required: Topic Modeling in NLP This is an intermediate
76
+ level course for data scientists who have some experience with NLP and want to
77
+ learn to cluster textual data.'
78
+ - 'Course language: Python'
79
+ - 'Prerequisite course required: Topic Modeling in NLP'
80
+ - This course covers the clustering concepts of natural language processing, equipping
81
+ learners with the ability to cluster text data into groups and topics by finding
82
+ similarities between different documents.
83
+ - This is an intermediate level course for data scientists who have some experience
84
+ with NLP and want to learn to cluster textual data.
85
+ ---
86
+
87
+ # SentenceTransformer based on BAAI/bge-base-en-v1.5
88
+
89
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
90
+
91
+ ## Model Details
92
+
93
+ ### Model Description
94
+ - **Model Type:** Sentence Transformer
95
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
96
+ - **Maximum Sequence Length:** 512 tokens
97
+ - **Output Dimensionality:** 768 tokens
98
+ - **Similarity Function:** Cosine Similarity
99
+ <!-- - **Training Dataset:** Unknown -->
100
+ <!-- - **Language:** Unknown -->
101
+ <!-- - **License:** Unknown -->
102
+
103
+ ### Model Sources
104
+
105
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
106
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
107
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
108
+
109
+ ### Full Model Architecture
110
+
111
+ ```
112
+ SentenceTransformer(
113
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
114
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
115
+ (2): Normalize()
116
+ )
117
+ ```
118
+
119
+ ## Usage
120
+
121
+ ### Direct Usage (Sentence Transformers)
122
+
123
+ First install the Sentence Transformers library:
124
+
125
+ ```bash
126
+ pip install -U sentence-transformers
127
+ ```
128
+
129
+ Then you can load this model and run inference.
130
+ ```python
131
+ from sentence_transformers import SentenceTransformer
132
+
133
+ # Download from the 🤗 Hub
134
+ model = SentenceTransformer("datasocietyco/bge-base-en-v1.5-course-recommender-v1")
135
+ # Run inference
136
+ sentences = [
137
+ 'Clustering in NLP',
138
+ 'This course covers the clustering concepts of natural language processing, equipping learners with the ability to cluster text data into groups and topics by finding similarities between different documents.',
139
+ 'Course language: Python',
140
+ ]
141
+ embeddings = model.encode(sentences)
142
+ print(embeddings.shape)
143
+ # [3, 768]
144
+
145
+ # Get the similarity scores for the embeddings
146
+ similarities = model.similarity(embeddings, embeddings)
147
+ print(similarities.shape)
148
+ # [3, 3]
149
+ ```
150
+
151
+ <!--
152
+ ### Direct Usage (Transformers)
153
+
154
+ <details><summary>Click to see the direct usage in Transformers</summary>
155
+
156
+ </details>
157
+ -->
158
+
159
+ <!--
160
+ ### Downstream Usage (Sentence Transformers)
161
+
162
+ You can finetune this model on your own dataset.
163
+
164
+ <details><summary>Click to expand</summary>
165
+
166
+ </details>
167
+ -->
168
+
169
+ <!--
170
+ ### Out-of-Scope Use
171
+
172
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
173
+ -->
174
+
175
+ <!--
176
+ ## Bias, Risks and Limitations
177
+
178
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
179
+ -->
180
+
181
+ <!--
182
+ ### Recommendations
183
+
184
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
185
+ -->
186
+
187
+ ## Training Details
188
+
189
+ ### Training Dataset
190
+
191
+ #### Unnamed Dataset
192
+
193
+
194
+ * Size: 183 training samples
195
+ * Columns: <code>name</code>, <code>description</code>, <code>languages</code>, <code>prerequisites</code>, <code>target_audience</code>, and <code>merged</code>
196
+ * Approximate statistics based on the first 183 samples:
197
+ | | name | description | languages | prerequisites | target_audience | merged |
198
+ |:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
199
+ | type | string | string | string | string | string | string |
200
+ | details | <ul><li>min: 3 tokens</li><li>mean: 7.06 tokens</li><li>max: 16 tokens</li></ul> | <ul><li>min: 13 tokens</li><li>mean: 40.5 tokens</li><li>max: 117 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 6.66 tokens</li><li>max: 10 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 12.56 tokens</li><li>max: 21 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 23.2 tokens</li><li>max: 54 tokens</li></ul> | <ul><li>min: 45 tokens</li><li>mean: 81.98 tokens</li><li>max: 174 tokens</li></ul> |
201
+ * Samples:
202
+ | name | description | languages | prerequisites | target_audience | merged |
203
+ |:----------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------|:-----------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
204
+ | <code>Foundations of Big Data</code> | <code>A theoretical course covering topics on how to handle data at scale and the different tools needed for distributed data storage, analysis, and management. Learners will be able to dive into the vast world of data and computing at scale and get a comprehensive overview of distributed computing.</code> | <code>Course language: TBD</code> | <code>Prerequisite course required: Optimizing Ensemble Methods</code> | <code>Professionals who would like to learn the core concepts of big data and understand data at scale</code> | <code>Foundations of Big Data A theoretical course covering topics on how to handle data at scale and the different tools needed for distributed data storage, analysis, and management. Learners will be able to dive into the vast world of data and computing at scale and get a comprehensive overview of distributed computing. Course language: TBD Prerequisite course required: Optimizing Ensemble Methods Professionals who would like to learn the core concepts of big data and understand data at scale</code> |
205
+ | <code>Big Data Orchestration & Workflow Management</code> | <code>A theoretical course covering topics on how to handle data at scale and the different tools needed for orchestrating big data systems and manage the workflow. Learners will be able to dive into the vast world of data and computing at scale and get a comprehensive overview of the distributed resource management ecosystem.</code> | <code>Course language: TBD</code> | <code>Prerequisite course required: Foundations of Big Data</code> | <code>Professionals who would like to learn the core concepts of distributed system orchestration and workflow management tools.</code> | <code>Big Data Orchestration & Workflow Management A theoretical course covering topics on how to handle data at scale and the different tools needed for orchestrating big data systems and manage the workflow. Learners will be able to dive into the vast world of data and computing at scale and get a comprehensive overview of the distributed resource management ecosystem. Course language: TBD Prerequisite course required: Foundations of Big Data Professionals who would like to learn the core concepts of distributed system orchestration and workflow management tools.</code> |
206
+ | <code>Distributed Data Storage (Hadoop)</code> | <code>A course that covers theory and implementation on a specific cloud platform covering topics on distributed data storage systems. Learners will be able to dive into the nature of storing and processing data at scale using tools like Hadoop on a selected cloud platform. This course will allow students to get a great foundation for creating and managing distributed data storage resources.</code> | <code>Course language: Java, Python</code> | <code>Prerequisite course required: Foundations of Big Data</code> | <code>Professionals who have coding knowledge and want to learn to create a scalable data storage solution using cloud services.</code> | <code>Distributed Data Storage (Hadoop) A course that covers theory and implementation on a specific cloud platform covering topics on distributed data storage systems. Learners will be able to dive into the nature of storing and processing data at scale using tools like Hadoop on a selected cloud platform. This course will allow students to get a great foundation for creating and managing distributed data storage resources. Course language: Java, Python Prerequisite course required: Foundations of Big Data Professionals who have coding knowledge and want to learn to create a scalable data storage solution using cloud services.</code> |
207
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
208
+ ```json
209
+ {
210
+ "scale": 20.0,
211
+ "similarity_fct": "cos_sim"
212
+ }
213
+ ```
214
+
215
+ ### Evaluation Dataset
216
+
217
+ #### Unnamed Dataset
218
+
219
+
220
+ * Size: 50 evaluation samples
221
+ * Columns: <code>name</code>, <code>description</code>, <code>languages</code>, <code>prerequisites</code>, <code>target_audience</code>, and <code>merged</code>
222
+ * Approximate statistics based on the first 50 samples:
223
+ | | name | description | languages | prerequisites | target_audience | merged |
224
+ |:--------|:---------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
225
+ | type | string | string | string | string | string | string |
226
+ | details | <ul><li>min: 3 tokens</li><li>mean: 6.98 tokens</li><li>max: 15 tokens</li></ul> | <ul><li>min: 16 tokens</li><li>mean: 39.66 tokens</li><li>max: 83 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 6.66 tokens</li><li>max: 10 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 12.58 tokens</li><li>max: 21 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 24.06 tokens</li><li>max: 54 tokens</li></ul> | <ul><li>min: 47 tokens</li><li>mean: 81.94 tokens</li><li>max: 139 tokens</li></ul> |
227
+ * Samples:
228
+ | name | description | languages | prerequisites | target_audience | merged |
229
+ |:----------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------|:-------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
230
+ | <code>Word Embeddings in NLP</code> | <code>This course covers the intermediate concepts of natural language processing like creating word embeddings, feature engineering and word embeddings for finding text features for model development.</code> | <code>Course language: Python</code> | <code>Prerequisite course required: Topic Modeling in NLP</code> | <code>This is an intermediate level course for data scientists who have experience in NLP and want to learn to process and mine natural language and text data.</code> | <code>Word Embeddings in NLP This course covers the intermediate concepts of natural language processing like creating word embeddings, feature engineering and word embeddings for finding text features for model development. Course language: Python Prerequisite course required: Topic Modeling in NLP This is an intermediate level course for data scientists who have experience in NLP and want to learn to process and mine natural language and text data.</code> |
231
+ | <code>Big Data Orchestration & Workflow Management</code> | <code>A theoretical course covering topics on how to handle data at scale and the different tools needed for orchestrating big data systems and manage the workflow. Learners will be able to dive into the vast world of data and computing at scale and get a comprehensive overview of the distributed resource management ecosystem.</code> | <code>Course language: TBD</code> | <code>Prerequisite course required: Foundations of Big Data</code> | <code>Professionals who would like to learn the core concepts of distributed system orchestration and workflow management tools.</code> | <code>Big Data Orchestration & Workflow Management A theoretical course covering topics on how to handle data at scale and the different tools needed for orchestrating big data systems and manage the workflow. Learners will be able to dive into the vast world of data and computing at scale and get a comprehensive overview of the distributed resource management ecosystem. Course language: TBD Prerequisite course required: Foundations of Big Data Professionals who would like to learn the core concepts of distributed system orchestration and workflow management tools.</code> |
232
+ | <code>Accelerating Data Engineering Pipelines</code> | <code>Explore how to employ advanced data engineering tools and techniques with GPUs to significantly improve data engineering pipelines</code> | <code>Course language: Python</code> | <code>No prerequisite course required</code> | <code>Professionals who wants to learn the foundation of data science and lays the groundwork for analysis and modeling.</code> | <code>Accelerating Data Engineering Pipelines Explore how to employ advanced data engineering tools and techniques with GPUs to significantly improve data engineering pipelines Course language: Python No prerequisite course required Professionals who wants to learn the foundation of data science and lays the groundwork for analysis and modeling.</code> |
233
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
234
+ ```json
235
+ {
236
+ "scale": 20.0,
237
+ "similarity_fct": "cos_sim"
238
+ }
239
+ ```
240
+
241
+ ### Training Hyperparameters
242
+ #### Non-Default Hyperparameters
243
+
244
+ - `eval_strategy`: steps
245
+ - `per_device_train_batch_size`: 16
246
+ - `per_device_eval_batch_size`: 16
247
+ - `learning_rate`: 3e-06
248
+ - `max_steps`: 64
249
+ - `warmup_ratio`: 0.1
250
+ - `batch_sampler`: no_duplicates
251
+
252
+ #### All Hyperparameters
253
+ <details><summary>Click to expand</summary>
254
+
255
+ - `overwrite_output_dir`: False
256
+ - `do_predict`: False
257
+ - `eval_strategy`: steps
258
+ - `prediction_loss_only`: True
259
+ - `per_device_train_batch_size`: 16
260
+ - `per_device_eval_batch_size`: 16
261
+ - `per_gpu_train_batch_size`: None
262
+ - `per_gpu_eval_batch_size`: None
263
+ - `gradient_accumulation_steps`: 1
264
+ - `eval_accumulation_steps`: None
265
+ - `torch_empty_cache_steps`: None
266
+ - `learning_rate`: 3e-06
267
+ - `weight_decay`: 0.0
268
+ - `adam_beta1`: 0.9
269
+ - `adam_beta2`: 0.999
270
+ - `adam_epsilon`: 1e-08
271
+ - `max_grad_norm`: 1.0
272
+ - `num_train_epochs`: 3.0
273
+ - `max_steps`: 64
274
+ - `lr_scheduler_type`: linear
275
+ - `lr_scheduler_kwargs`: {}
276
+ - `warmup_ratio`: 0.1
277
+ - `warmup_steps`: 0
278
+ - `log_level`: passive
279
+ - `log_level_replica`: warning
280
+ - `log_on_each_node`: True
281
+ - `logging_nan_inf_filter`: True
282
+ - `save_safetensors`: True
283
+ - `save_on_each_node`: False
284
+ - `save_only_model`: False
285
+ - `restore_callback_states_from_checkpoint`: False
286
+ - `no_cuda`: False
287
+ - `use_cpu`: False
288
+ - `use_mps_device`: False
289
+ - `seed`: 42
290
+ - `data_seed`: None
291
+ - `jit_mode_eval`: False
292
+ - `use_ipex`: False
293
+ - `bf16`: False
294
+ - `fp16`: False
295
+ - `fp16_opt_level`: O1
296
+ - `half_precision_backend`: auto
297
+ - `bf16_full_eval`: False
298
+ - `fp16_full_eval`: False
299
+ - `tf32`: None
300
+ - `local_rank`: 0
301
+ - `ddp_backend`: None
302
+ - `tpu_num_cores`: None
303
+ - `tpu_metrics_debug`: False
304
+ - `debug`: []
305
+ - `dataloader_drop_last`: False
306
+ - `dataloader_num_workers`: 0
307
+ - `dataloader_prefetch_factor`: None
308
+ - `past_index`: -1
309
+ - `disable_tqdm`: False
310
+ - `remove_unused_columns`: True
311
+ - `label_names`: None
312
+ - `load_best_model_at_end`: False
313
+ - `ignore_data_skip`: False
314
+ - `fsdp`: []
315
+ - `fsdp_min_num_params`: 0
316
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
317
+ - `fsdp_transformer_layer_cls_to_wrap`: None
318
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
319
+ - `deepspeed`: None
320
+ - `label_smoothing_factor`: 0.0
321
+ - `optim`: adamw_torch
322
+ - `optim_args`: None
323
+ - `adafactor`: False
324
+ - `group_by_length`: False
325
+ - `length_column_name`: length
326
+ - `ddp_find_unused_parameters`: None
327
+ - `ddp_bucket_cap_mb`: None
328
+ - `ddp_broadcast_buffers`: False
329
+ - `dataloader_pin_memory`: True
330
+ - `dataloader_persistent_workers`: False
331
+ - `skip_memory_metrics`: True
332
+ - `use_legacy_prediction_loop`: False
333
+ - `push_to_hub`: False
334
+ - `resume_from_checkpoint`: None
335
+ - `hub_model_id`: None
336
+ - `hub_strategy`: every_save
337
+ - `hub_private_repo`: False
338
+ - `hub_always_push`: False
339
+ - `gradient_checkpointing`: False
340
+ - `gradient_checkpointing_kwargs`: None
341
+ - `include_inputs_for_metrics`: False
342
+ - `eval_do_concat_batches`: True
343
+ - `fp16_backend`: auto
344
+ - `push_to_hub_model_id`: None
345
+ - `push_to_hub_organization`: None
346
+ - `mp_parameters`:
347
+ - `auto_find_batch_size`: False
348
+ - `full_determinism`: False
349
+ - `torchdynamo`: None
350
+ - `ray_scope`: last
351
+ - `ddp_timeout`: 1800
352
+ - `torch_compile`: False
353
+ - `torch_compile_backend`: None
354
+ - `torch_compile_mode`: None
355
+ - `dispatch_batches`: None
356
+ - `split_batches`: None
357
+ - `include_tokens_per_second`: False
358
+ - `include_num_input_tokens_seen`: False
359
+ - `neftune_noise_alpha`: None
360
+ - `optim_target_modules`: None
361
+ - `batch_eval_metrics`: False
362
+ - `eval_on_start`: False
363
+ - `use_liger_kernel`: False
364
+ - `eval_use_gather_object`: False
365
+ - `batch_sampler`: no_duplicates
366
+ - `multi_dataset_batch_sampler`: proportional
367
+
368
+ </details>
369
+
370
+ ### Training Logs
371
+ | Epoch | Step | Training Loss | loss |
372
+ |:------:|:----:|:-------------:|:------:|
373
+ | 1.6667 | 20 | 1.4345 | 1.0243 |
374
+ | 3.3333 | 40 | 0.9835 | 0.7613 |
375
+ | 5.0 | 60 | 0.7294 | 0.6593 |
376
+
377
+
378
+ ### Framework Versions
379
+ - Python: 3.9.13
380
+ - Sentence Transformers: 3.1.1
381
+ - Transformers: 4.45.1
382
+ - PyTorch: 2.2.2
383
+ - Accelerate: 0.34.2
384
+ - Datasets: 3.0.0
385
+ - Tokenizers: 0.20.0
386
+
387
+ ## Citation
388
+
389
+ ### BibTeX
390
+
391
+ #### Sentence Transformers
392
+ ```bibtex
393
+ @inproceedings{reimers-2019-sentence-bert,
394
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
395
+ author = "Reimers, Nils and Gurevych, Iryna",
396
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
397
+ month = "11",
398
+ year = "2019",
399
+ publisher = "Association for Computational Linguistics",
400
+ url = "https://arxiv.org/abs/1908.10084",
401
+ }
402
+ ```
403
+
404
+ #### MultipleNegativesRankingLoss
405
+ ```bibtex
406
+ @misc{henderson2017efficient,
407
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
408
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
409
+ year={2017},
410
+ eprint={1705.00652},
411
+ archivePrefix={arXiv},
412
+ primaryClass={cs.CL}
413
+ }
414
+ ```
415
+
416
+ <!--
417
+ ## Glossary
418
+
419
+ *Clearly define terms in order to be accessible across audiences.*
420
+ -->
421
+
422
+ <!--
423
+ ## Model Card Authors
424
+
425
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
426
+ -->
427
+
428
+ <!--
429
+ ## Model Card Contact
430
+
431
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
432
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.45.1",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.1.1",
4
+ "transformers": "4.45.1",
5
+ "pytorch": "2.2.2"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6900af4c831ea3426808bbb109bcc1f0fc5272c8e6add8183da8fdd767aa5156
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff