Add new CrossEncoder model

Browse files

Files changed (7) hide show

README.md +460 -0
config.json +31 -0
model.safetensors +3 -0
special_tokens_map.json +7 -0
tokenizer.json +0 -0
tokenizer_config.json +58 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,460 @@

+---
+language:
+- en
+tags:
+- sentence-transformers
+- cross-encoder
+- text-classification
+- generated_from_trainer
+- dataset_size:78704
+- loss:ListNetLoss
+base_model: microsoft/MiniLM-L12-H384-uncased
+datasets:
+- microsoft/ms_marco
+pipeline_tag: text-classification
+library_name: sentence-transformers
+metrics:
+- map
+- mrr@10
+- ndcg@10
+co2_eq_emissions:
+  emissions: 205.4804729340415
+  energy_consumed: 0.5286324046031189
+  source: codecarbon
+  training_type: fine-tuning
+  on_cloud: false
+  cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
+  ram_total_size: 31.777088165283203
+  hours_used: 1.686
+  hardware_used: 1 x NVIDIA GeForce RTX 3090
+model-index:
+- name: CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
+  results: []
+---
+# CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
+This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) on the [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
+## Model Details
+### Model Description
+- **Model Type:** Cross Encoder
+- **Base model:** [microsoft/MiniLM-L12-H384-uncased](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased) <!-- at revision 44acabbec0ef496f6dbc93adadea57f376b7c0ec -->
+- **Maximum Sequence Length:** 512 tokens
+- **Number of Output Labels:** 1 label
+- **Training Dataset:**
+    - [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco)
+- **Language:** en
+<!-- - **License:** Unknown -->
+### Model Sources
+- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
+- **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
+- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
+- **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
+## Usage
+### Direct Usage (Sentence Transformers)
+First install the Sentence Transformers library:
+```bash
+pip install -U sentence-transformers
+```
+Then you can load this model and run inference.
+```python
+from sentence_transformers import CrossEncoder
+# Download from the 🤗 Hub
+model = CrossEncoder("tomaarsen/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-listnet-identity")
+# Get scores for pairs of texts
+pairs = [
+    ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
+    ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
+    ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
+]
+scores = model.predict(pairs)
+print(scores.shape)
+# (3,)
+# Or rank different texts based on similarity to a single text
+ranks = model.rank(
+    'How many calories in an egg',
+    [
+        'There are on average between 55 and 80 calories in an egg depending on its size.',
+        'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
+        'Most of the calories in an egg come from the yellow yolk in the center.',
+    ]
+)
+# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
+```
+<!--
+### Direct Usage (Transformers)
+<details><summary>Click to see the direct usage in Transformers</summary>
+</details>
+-->
+<!--
+### Downstream Usage (Sentence Transformers)
+You can finetune this model on your own dataset.
+<details><summary>Click to expand</summary>
+</details>
+-->
+<!--
+### Out-of-Scope Use
+*List how the model may foreseeably be misused and address what users ought not to do with the model.*
+-->
+## Evaluation
+### Metrics
+#### Cross Encoder Reranking
+* Datasets: `NanoMSMARCO`, `NanoNFCorpus` and `NanoNQ`
+* Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator)
+| Metric      | NanoMSMARCO          | NanoNFCorpus         | NanoNQ               |
+|:------------|:---------------------|:---------------------|:---------------------|
+| map         | 0.4847 (-0.0049)     | 0.3325 (+0.0716)     | 0.5967 (+0.1771)     |
+| mrr@10      | 0.4768 (-0.0007)     | 0.5669 (+0.0670)     | 0.6024 (+0.1757)     |
+| **ndcg@10** | **0.5573 (+0.0168)** | **0.3623 (+0.0373)** | **0.6499 (+0.1492)** |
+#### Cross Encoder Nano BEIR
+* Dataset: `NanoBEIR_R100_mean`
+* Evaluated with [<code>CrossEncoderNanoBEIREvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderNanoBEIREvaluator)
+| Metric      | Value                |
+|:------------|:---------------------|
+| map         | 0.4713 (+0.0813)     |
+| mrr@10      | 0.5487 (+0.0807)     |
+| **ndcg@10** | **0.5231 (+0.0678)** |
+<!--
+## Bias, Risks and Limitations
+*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
+-->
+<!--
+### Recommendations
+*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
+-->
+## Training Details
+### Training Dataset
+#### ms_marco
+* Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
+* Size: 78,704 training samples
+* Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | query                                                                                          | docs                                | labels                              |
+  |:--------|:-----------------------------------------------------------------------------------------------|:------------------------------------|:------------------------------------|
+  | type    | string                                                                                         | list                                | list                                |
+  | details | <ul><li>min: 10 characters</li><li>mean: 33.93 characters</li><li>max: 99 characters</li></ul> | <ul><li>size: 10 elements</li></ul> | <ul><li>size: 10 elements</li></ul> |
+* Samples:
+  | query                                                 | docs                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | labels                            |
+  |:------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
+  | <code>what types of moons are there</code>            | <code>["The different types of moons are: Full Wolf Moon, Full Snow Moon, Full Worm Moon, paschal full moon, full pink moon, full flower moon, full strawberry moon, full buck moon, … full sturgeon moon, full harvest moon, full hunters moon, full beaver moon, full cold moon. The solar eclipse, when the moon blocks the sun's light from hitting the earth-creating a temporary blackout on earth, can occur only at the time of New Moon, while the luna … r eclipse, when the earth blocks the sun's light from reflecting off the moon, can occur only at the time of Full Moon.", 'Types of Moons. Full Moon names date back to Native Americans, of what is now the northern and eastern United States. The tribes kept track of the seasons by giving distinctive names to each recurring full Moon. Their names were applied to the entire month in which each occurred. There was some variation in the Moon names, but in general, the same ones were current throughout the Algonquin tribes from New England to Lake Superio...</code> | <code>[1, 1, 1, 0, 0, ...]</code> |
+  | <code>what is beryllium commonly combined with</code> | <code>['Beryllium is an industrial metal with some attractive attributes. It’s lighter than aluminum and 6x stronger than steel. It’s usually combined with other metals and is a key component in the aerospace and electronics industries. Beryllium is also used in the production of nuclear weapons. With that, you may not be surprised to learn that beryllium is one of the most toxic elements in existence. Beryllium is a Class A EPA carcinogen and exposure can cause Chronic Beryllium Disease, an often fatal lung disease. ', 'Beryllium is found in about 30 different mineral species. The most important are beryl (beryllium aluminium silicate) and bertrandite (beryllium silicate). Emerald and aquamarine are precious forms of beryl. The metal is usually prepared by reducing beryllium fluoride with magnesium metal. Uses. Beryllium is used in alloys with copper or nickel to make gyroscopes, springs, electrical contacts, spot-welding electrodes and non-sparking tools. Mixing beryllium with these metals...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
+  | <code>is turkish coffee healthy</code>                | <code>["Calories, Fat and Other Basics. A serving of Turkish coffee contains about 46 calories. Though the drink doesn't contain any fat, it also doesn't supply any fiber or protein, two key nutrients needed for good health. The coffee doesn't supply an impressive amount of calcium or iron either. A blend of strong coffee, sugar and cardamom, Turkish coffee is more of a sweet treat than something similar to a regular cup of coffee. While there are certain health benefits from the coffee and cardamom, sugar is a major drawback when it comes to the nutritional benefits of the drink", "A serving of Turkish coffee contains about 11.5 grams of sugar, which is equal to almost 3 teaspoons. That's half of the 6 teaspoons women should limit themselves to each day and one-third of the 9 teaspoons men should set as their daily upper limit, according to the American Heart Association. A blend of strong coffee, sugar and cardamom, Turkish coffee is more of a sweet treat than something similar to a regula...</code> | <code>[1, 1, 0, 0, 0, ...]</code> |
+* Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters:
+  ```json
+  {
+      "eps": 1e-10,
+      "pad_value": -1,
+      "activation_fct": "torch.nn.modules.linear.Identity"
+  }
+  ```
+### Evaluation Dataset
+#### ms_marco
+* Dataset: [ms_marco](https://huggingface.co/datasets/microsoft/ms_marco) at [a47ee7a](https://huggingface.co/datasets/microsoft/ms_marco/tree/a47ee7aae8d7d466ba15f9f0bfac3b3681087b3a)
+* Size: 1,000 evaluation samples
+* Columns: <code>query</code>, <code>docs</code>, and <code>labels</code>
+* Approximate statistics based on the first 1000 samples:
+  |         | query                                                                                           | docs                                | labels                              |
+  |:--------|:------------------------------------------------------------------------------------------------|:------------------------------------|:------------------------------------|
+  | type    | string                                                                                          | list                                | list                                |
+  | details | <ul><li>min: 10 characters</li><li>mean: 33.81 characters</li><li>max: 110 characters</li></ul> | <ul><li>size: 10 elements</li></ul> | <ul><li>size: 10 elements</li></ul> |
+* Samples:
+  | query                                                | docs                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | labels                            |
+  |:-----------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
+  | <code>what is a fishy smell on humans</code>         | <code>["Trimethylaminuria (TMAU), also known as fish odor syndrome or fish malodor syndrome, is a rare metabolic disorder where Trimethylamine is released in the person's sweat, urine, and breath, giving off a strong fishy odor or strong body odor. Body odor is generally considered to be an unpleasant odor among many human cultures.", "The trimethylamine is released in the person's sweat, urine, reproductive fluids, and breath, giving off a strong fishy or body odor. Some people with trimethylaminuria have a strong odor all the time, but most have a moderate smell that varies in intensity over time. Although FMO3 mutations account for most known cases of trimethylaminuria, some cases are caused by other factors. A fish-like body odor could result from an excess of certain proteins in the diet or from an increase in bacteria in the digestive system.", 'Trimethylaminuria is a disorder in which the body is unable to break down trimethylamine, a chemical compound that has a pungent odor. Trimeth...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
+  | <code>how to cut woodworking joints</code>           | <code>['The tails and pins interlock to form a strong 90-degree joint. Dovetail joints are technically complex and are often used to create drawer boxes for furniture. Through mortise and tenon – To form this joint, a round or square hole (called a mortise) is cut through the side of one piece of wood. The end of the other piece of wood is cut to have a projection (the tenon) that matches the mortise. The tenon is placed into the mortise, projecting out from the other side of the wood. A wedge is hammered into a hole in the tenon. The wedge keeps the tenon from sliding out of the mortise.', "Wood joinery is simply the method by which two pieces of wood are connected. In many cases, the appearance of a joint becomes at least as important as it's strength. Wood joinery encompasses everything from intricate half-blind dovetails to connections that are simply nailed, glued or screwed. How to Use Biscuit Joints. Share. Doweling as a method of joinery is simple: a few dowels are glued into matchin...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
+  | <code>how long does it take to be a paramedic</code> | <code>['In Kansas, you first have to take an EMT course which is roughly 6 months long depending on where you take the course. Then you have to take A&P, English Comp, Sociology, Algebra, & Interpersonal Communication as pre-requisites for Paramedic. EMT is 110 hours which can be done in 3 weeks or dragged out for several months by just going one night per week to class. The Paramedic is 600 - 1200 hours in length depending on the state and averages about 6 - 9 months of training.', 'Coursework and training to become an EMT-basic or first responder can generally be completed in as little as three weeks on an accelerated basis. For part-time students, these programs may take around 8-11 weeks to complete. To become an EMT-intermediate 1985 or 1999, students generally must complete 30-350 hours of training. This training requirement varies according to the procedures the state allows these EMTs to perform.', 'How long does it take to be a paramedic depends on the area of study and the skill on...</code> | <code>[1, 0, 0, 0, 0, ...]</code> |
+* Loss: [<code>ListNetLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#listnetloss) with these parameters:
+  ```json
+  {
+      "eps": 1e-10,
+      "pad_value": -1,
+      "activation_fct": "torch.nn.modules.linear.Identity"
+  }
+  ```
+### Training Hyperparameters
+#### Non-Default Hyperparameters
+- `eval_strategy`: steps
+- `per_device_train_batch_size`: 6
+- `per_device_eval_batch_size`: 16
+- `learning_rate`: 2e-05
+- `warmup_ratio`: 0.1
+- `seed`: 12
+- `bf16`: True
+- `load_best_model_at_end`: True
+#### All Hyperparameters
+<details><summary>Click to expand</summary>
+- `overwrite_output_dir`: False
+- `do_predict`: False
+- `eval_strategy`: steps
+- `prediction_loss_only`: True
+- `per_device_train_batch_size`: 6
+- `per_device_eval_batch_size`: 16
+- `per_gpu_train_batch_size`: None
+- `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 1
+- `eval_accumulation_steps`: None
+- `torch_empty_cache_steps`: None
+- `learning_rate`: 2e-05
+- `weight_decay`: 0.0
+- `adam_beta1`: 0.9
+- `adam_beta2`: 0.999
+- `adam_epsilon`: 1e-08
+- `max_grad_norm`: 1.0
+- `num_train_epochs`: 3
+- `max_steps`: -1
+- `lr_scheduler_type`: linear
+- `lr_scheduler_kwargs`: {}
+- `warmup_ratio`: 0.1
+- `warmup_steps`: 0
+- `log_level`: passive
+- `log_level_replica`: warning
+- `log_on_each_node`: True
+- `logging_nan_inf_filter`: True
+- `save_safetensors`: True
+- `save_on_each_node`: False
+- `save_only_model`: False
+- `restore_callback_states_from_checkpoint`: False
+- `no_cuda`: False
+- `use_cpu`: False
+- `use_mps_device`: False
+- `seed`: 12
+- `data_seed`: None
+- `jit_mode_eval`: False
+- `use_ipex`: False
+- `bf16`: True
+- `fp16`: False
+- `fp16_opt_level`: O1
+- `half_precision_backend`: auto
+- `bf16_full_eval`: False
+- `fp16_full_eval`: False
+- `tf32`: None
+- `local_rank`: 0
+- `ddp_backend`: None
+- `tpu_num_cores`: None
+- `tpu_metrics_debug`: False
+- `debug`: []
+- `dataloader_drop_last`: False
+- `dataloader_num_workers`: 0
+- `dataloader_prefetch_factor`: None
+- `past_index`: -1
+- `disable_tqdm`: False
+- `remove_unused_columns`: True
+- `label_names`: None
+- `load_best_model_at_end`: True
+- `ignore_data_skip`: False
+- `fsdp`: []
+- `fsdp_min_num_params`: 0
+- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
+- `fsdp_transformer_layer_cls_to_wrap`: None
+- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
+- `deepspeed`: None
+- `label_smoothing_factor`: 0.0
+- `optim`: adamw_torch
+- `optim_args`: None
+- `adafactor`: False
+- `group_by_length`: False
+- `length_column_name`: length
+- `ddp_find_unused_parameters`: None
+- `ddp_bucket_cap_mb`: None
+- `ddp_broadcast_buffers`: False
+- `dataloader_pin_memory`: True
+- `dataloader_persistent_workers`: False
+- `skip_memory_metrics`: True
+- `use_legacy_prediction_loop`: False
+- `push_to_hub`: False
+- `resume_from_checkpoint`: None
+- `hub_model_id`: None
+- `hub_strategy`: every_save
+- `hub_private_repo`: None
+- `hub_always_push`: False
+- `gradient_checkpointing`: False
+- `gradient_checkpointing_kwargs`: None
+- `include_inputs_for_metrics`: False
+- `include_for_metrics`: []
+- `eval_do_concat_batches`: True
+- `fp16_backend`: auto
+- `push_to_hub_model_id`: None
+- `push_to_hub_organization`: None
+- `mp_parameters`:
+- `auto_find_batch_size`: False
+- `full_determinism`: False
+- `torchdynamo`: None
+- `ray_scope`: last
+- `ddp_timeout`: 1800
+- `torch_compile`: False
+- `torch_compile_backend`: None
+- `torch_compile_mode`: None
+- `dispatch_batches`: None
+- `split_batches`: None
+- `include_tokens_per_second`: False
+- `include_num_input_tokens_seen`: False
+- `neftune_noise_alpha`: None
+- `optim_target_modules`: None
+- `batch_eval_metrics`: False
+- `eval_on_start`: False
+- `use_liger_kernel`: False
+- `eval_use_gather_object`: False
+- `average_tokens_across_devices`: False
+- `prompts`: None
+- `batch_sampler`: batch_sampler
+- `multi_dataset_batch_sampler`: proportional
+</details>
+### Training Logs
+| Epoch      | Step      | Training Loss | Validation Loss | NanoMSMARCO_ndcg@10  | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10       | NanoBEIR_R100_mean_ndcg@10 |
+|:----------:|:---------:|:-------------:|:---------------:|:--------------------:|:--------------------:|:--------------------:|:--------------------------:|
+| -1         | -1        | -             | -               | 0.0155 (-0.5250)     | 0.3609 (+0.0358)     | 0.0410 (-0.4597)     | 0.1391 (-0.3163)           |
+| 0.0001     | 1         | 2.1559        | -               | -                    | -                    | -                    | -                          |
+| 0.0762     | 1000      | 2.0862        | -               | -                    | -                    | -                    | -                          |
+| 0.1525     | 2000      | 2.0787        | -               | -                    | -                    | -                    | -                          |
+| 0.2287     | 3000      | 2.0785        | -               | -                    | -                    | -                    | -                          |
+| 0.3049     | 4000      | 2.0738        | 2.0755          | 0.5129 (-0.0276)     | 0.3371 (+0.0120)     | 0.5561 (+0.0555)     | 0.4687 (+0.0133)           |
+| 0.3812     | 5000      | 2.0828        | -               | -                    | -                    | -                    | -                          |
+| 0.4574     | 6000      | 2.0711        | -               | -                    | -                    | -                    | -                          |
+| 0.5336     | 7000      | 2.072         | -               | -                    | -                    | -                    | -                          |
+| 0.6098     | 8000      | 2.0721        | 2.0734          | 0.5627 (+0.0222)     | 0.3547 (+0.0296)     | 0.5691 (+0.0684)     | 0.4955 (+0.0401)           |
+| 0.6861     | 9000      | 2.0714        | -               | -                    | -                    | -                    | -                          |
+| 0.7623     | 10000     | 2.0744        | -               | -                    | -                    | -                    | -                          |
+| 0.8385     | 11000     | 2.0708        | -               | -                    | -                    | -                    | -                          |
+| **0.9148** | **12000** | **2.0705**    | **2.0732**      | **0.5573 (+0.0168)** | **0.3623 (+0.0373)** | **0.6499 (+0.1492)** | **0.5231 (+0.0678)**       |
+| 0.9910     | 13000     | 2.0721        | -               | -                    | -                    | -                    | -                          |
+| 1.0672     | 14000     | 2.065         | -               | -                    | -                    | -                    | -                          |
+| 1.1435     | 15000     | 2.0732        | -               | -                    | -                    | -                    | -                          |
+| 1.2197     | 16000     | 2.07          | 2.0729          | 0.5673 (+0.0269)     | 0.3563 (+0.0312)     | 0.5877 (+0.0870)     | 0.5038 (+0.0484)           |
+| 1.2959     | 17000     | 2.0707        | -               | -                    | -                    | -                    | -                          |
+| 1.3722     | 18000     | 2.0719        | -               | -                    | -                    | -                    | -                          |
+| 1.4484     | 19000     | 2.0687        | -               | -                    | -                    | -                    | -                          |
+| 1.5246     | 20000     | 2.0675        | 2.0730          | 0.5633 (+0.0228)     | 0.3264 (+0.0014)     | 0.5949 (+0.0943)     | 0.4949 (+0.0395)           |
+| 1.6009     | 21000     | 2.0698        | -               | -                    | -                    | -                    | -                          |
+| 1.6771     | 22000     | 2.0685        | -               | -                    | -                    | -                    | -                          |
+| 1.7533     | 23000     | 2.0683        | -               | -                    | -                    | -                    | -                          |
+| 1.8295     | 24000     | 2.0667        | 2.0731          | 0.5571 (+0.0166)     | 0.3521 (+0.0271)     | 0.6319 (+0.1313)     | 0.5137 (+0.0583)           |
+| 1.9058     | 25000     | 2.0665        | -               | -                    | -                    | -                    | -                          |
+| 1.9820     | 26000     | 2.0707        | -               | -                    | -                    | -                    | -                          |
+| 2.0582     | 27000     | 2.0663        | -               | -                    | -                    | -                    | -                          |
+| 2.1345     | 28000     | 2.0672        | 2.0739          | 0.5543 (+0.0139)     | 0.3346 (+0.0096)     | 0.5958 (+0.0952)     | 0.4949 (+0.0395)           |
+| 2.2107     | 29000     | 2.0661        | -               | -                    | -                    | -                    | -                          |
+| 2.2869     | 30000     | 2.0681        | -               | -                    | -                    | -                    | -                          |
+| 2.3632     | 31000     | 2.0626        | -               | -                    | -                    | -                    | -                          |
+| 2.4394     | 32000     | 2.0642        | 2.0745          | 0.5791 (+0.0387)     | 0.3347 (+0.0097)     | 0.6386 (+0.1380)     | 0.5175 (+0.0621)           |
+| 2.5156     | 33000     | 2.0635        | -               | -                    | -                    | -                    | -                          |
+| 2.5919     | 34000     | 2.0648        | -               | -                    | -                    | -                    | -                          |
+| 2.6681     | 35000     | 2.0615        | -               | -                    | -                    | -                    | -                          |
+| 2.7443     | 36000     | 2.0626        | 2.0736          | 0.5735 (+0.0331)     | 0.3288 (+0.0038)     | 0.6205 (+0.1198)     | 0.5076 (+0.0522)           |
+| 2.8206     | 37000     | 2.0621        | -               | -                    | -                    | -                    | -                          |
+| 2.8968     | 38000     | 2.0664        | -               | -                    | -                    | -                    | -                          |
+| 2.9730     | 39000     | 2.0621        | -               | -                    | -                    | -                    | -                          |
+| -1         | -1        | -             | -               | 0.5573 (+0.0168)     | 0.3623 (+0.0373)     | 0.6499 (+0.1492)     | 0.5231 (+0.0678)           |
+* The bold row denotes the saved checkpoint.
+### Environmental Impact
+Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
+- **Energy Consumed**: 0.529 kWh
+- **Carbon Emitted**: 0.205 kg of CO2
+- **Hours Used**: 1.686 hours
+### Training Hardware
+- **On Cloud**: No
+- **GPU Model**: 1 x NVIDIA GeForce RTX 3090
+- **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
+- **RAM Size**: 31.78 GB
+### Framework Versions
+- Python: 3.11.6
+- Sentence Transformers: 3.5.0.dev0
+- Transformers: 4.48.3
+- PyTorch: 2.5.0+cu121
+- Accelerate: 1.4.0
+- Datasets: 3.3.2
+- Tokenizers: 0.21.0
+## Citation
+### BibTeX
+#### Sentence Transformers
+```bibtex
+@inproceedings{reimers-2019-sentence-bert,
+    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
+    author = "Reimers, Nils and Gurevych, Iryna",
+    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
+    month = "11",
+    year = "2019",
+    publisher = "Association for Computational Linguistics",
+    url = "https://arxiv.org/abs/1908.10084",
+}
+```
+#### ListNetLoss
+```bibtex
+@inproceedings{cao2007learning,
+    title={Learning to rank: from pairwise approach to listwise approach},
+    author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
+    booktitle={Proceedings of the 24th international conference on Machine learning},
+    pages={129--136},
+    year={2007}
+}
+```
+<!--
+## Glossary
+*Clearly define terms in order to be accessible across audiences.*
+-->
+<!--
+## Model Card Authors
+*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
+-->
+<!--
+## Model Card Contact
+*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
+-->

config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "_name_or_path": "microsoft/MiniLM-L12-H384-uncased",
+  "architectures": [
+    "BertForSequenceClassification"
+  ],
+  "attention_probs_dropout_prob": 0.1,
+  "classifier_dropout": null,
+  "hidden_act": "gelu",
+  "hidden_dropout_prob": 0.1,
+  "hidden_size": 384,
+  "id2label": {
+    "0": "LABEL_0"
+  },
+  "initializer_range": 0.02,
+  "intermediate_size": 1536,
+  "label2id": {
+    "LABEL_0": 0
+  },
+  "layer_norm_eps": 1e-12,
+  "max_position_embeddings": 512,
+  "model_type": "bert",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 0,
+  "position_embedding_type": "absolute",
+  "torch_dtype": "float32",
+  "transformers_version": "4.48.3",
+  "type_vocab_size": 2,
+  "use_cache": true,
+  "vocab_size": 30522
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:81de3af766410b4cbc6ee23a4f9794b1586a4f7978b70b7a15440f768dfde6c1
+size 133464836

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,58 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_basic_tokenize": true,
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "never_split": null,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff