metadata
base_model: dunzhang/stella_en_1.5B_v5
library_name: sentence-transformers
metrics:
- cosine_accuracy@25
- cosine_precision@100
- cosine_precision@200
- cosine_precision@300
- cosine_precision@400
- cosine_precision@500
- cosine_precision@600
- cosine_precision@700
- cosine_precision@800
- cosine_precision@900
- cosine_precision@1000
- cosine_recall@100
- cosine_recall@200
- cosine_recall@300
- cosine_recall@400
- cosine_recall@500
- cosine_recall@600
- cosine_recall@700
- cosine_recall@800
- cosine_recall@900
- cosine_recall@1000
- cosine_ndcg@25
- cosine_mrr@25
- cosine_map@25
- dot_accuracy@25
- dot_precision@100
- dot_precision@200
- dot_precision@300
- dot_precision@400
- dot_precision@500
- dot_precision@600
- dot_precision@700
- dot_precision@800
- dot_precision@900
- dot_precision@1000
- dot_recall@100
- dot_recall@200
- dot_recall@300
- dot_recall@400
- dot_recall@500
- dot_recall@600
- dot_recall@700
- dot_recall@800
- dot_recall@900
- dot_recall@1000
- dot_ndcg@25
- dot_mrr@25
- dot_map@25
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:3999
- loss:CachedMultipleNegativesSymmetricRankingLoss
widget:
- source_sentence: >-
QuestionSummary: Adding and Subtracting Algebraic Fractions
Question: STEP \( 1 \)
Which of the options is a correct first step to express the following as a
single fraction?
\(
\frac{3}{x-2}-\frac{5}{x}
\)
CorrectAnswer: \( \frac{3 x}{x(x-2)}-\frac{5(x-2)}{x(x-2)} \)
Answer: \( \frac{3-5}{x-2-x} \)
sentences:
- When subtracting fractions, subtracts the numerators and denominators
- >-
Believes that the coefficient of x represents the gradient even when a
line is not in the form y = mx+c
- Confuses factors and multiples
- source_sentence: >-
QuestionSummary: Time
Question: Which of the following would correctly calculate the number of
seconds in \( 1 \) day?
CorrectAnswer: \( 24 \times 60 \times 60 \)
Answer: \( 24 \times 60 \)
sentences:
- Translates rather than reflects across a line of symmetry
- Does not understand the value of zeros as placeholders
- Converted hours to minutes instead of hours to seconds
- source_sentence: |-
QuestionSummary: Naming Co-ordinates in 2D
Question: Here are \( 3 \) vertices of a rectangle:
\((-6,-2),(-3,2),(0,0) \text {, }\)
What are the coordinates of the \( 4^{\text {th }} \) vertex?
CorrectAnswer: \( (-3,-4) \)
Answer: \( (-6,-4) \)
sentences:
- Thinks x = 1 at the x axis
- Does not know how to find the length of a line segment from coordinates
- Believes rounding numbers down would give an overestimate
- source_sentence: >-
QuestionSummary: Parts of a Circle
Question: What is the correct name for the line marked on the circle? ![\(
\theta \)]()
CorrectAnswer: Chord
Answer: Radius
sentences:
- >-
When completing the square, believes the constant in the bracket is
double the coefficient of x
- 'Cannot reflect shape when line of symmetry is diagonal '
- Confuses chord and radius
- source_sentence: |-
QuestionSummary: Function Machines
Question: Which of the following pairs of function machines are correct?
CorrectAnswer: \(a \Rightarrow \times2 \Rightarrow -5\Rightarrow 2a-5\)
\(a \Rightarrow -5 \Rightarrow \times2\Rightarrow 2(a-5)\)
Answer: \(a \Rightarrow \times2 \Rightarrow -5\Rightarrow 2a-5\)
\(a \Rightarrow \times2 \Rightarrow -5\Rightarrow 2(a-5)\)
sentences:
- >-
Does not follow the arrows through a function machine, changes the order
of the operations asked.
- Has used the wrong data point on the graph
- Incorrectly cancels what they believe is a factor in algebraic fractions
model-index:
- name: SentenceTransformer based on dunzhang/stella_en_1.5B_v5
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: val
type: val
metrics:
- type: cosine_accuracy@25
value: 0.6946039035591275
name: Cosine Accuracy@25
- type: cosine_precision@100
value: 0.008760045924225027
name: Cosine Precision@100
- type: cosine_precision@200
value: 0.004684270952927668
name: Cosine Precision@200
- type: cosine_precision@300
value: 0.003195560658247226
name: Cosine Precision@300
- type: cosine_precision@400
value: 0.0024339839265212397
name: Cosine Precision@400
- type: cosine_precision@500
value: 0.0019609644087256036
name: Cosine Precision@500
- type: cosine_precision@600
value: 0.0016437045541523154
name: Cosine Precision@600
- type: cosine_precision@700
value: 0.0014187305232081352
name: Cosine Precision@700
- type: cosine_precision@800
value: 0.001244259471871412
name: Cosine Precision@800
- type: cosine_precision@900
value: 0.0011085597652761834
name: Cosine Precision@900
- type: cosine_precision@1000
value: 0.0009977037887485653
name: Cosine Precision@1000
- type: cosine_recall@100
value: 0.8760045924225028
name: Cosine Recall@100
- type: cosine_recall@200
value: 0.9368541905855339
name: Cosine Recall@200
- type: cosine_recall@300
value: 0.9586681974741676
name: Cosine Recall@300
- type: cosine_recall@400
value: 0.9735935706084959
name: Cosine Recall@400
- type: cosine_recall@500
value: 0.9804822043628014
name: Cosine Recall@500
- type: cosine_recall@600
value: 0.9862227324913893
name: Cosine Recall@600
- type: cosine_recall@700
value: 0.9931113662456946
name: Cosine Recall@700
- type: cosine_recall@800
value: 0.9954075774971297
name: Cosine Recall@800
- type: cosine_recall@900
value: 0.9977037887485649
name: Cosine Recall@900
- type: cosine_recall@1000
value: 0.9977037887485649
name: Cosine Recall@1000
- type: cosine_ndcg@25
value: 0.35640204555925886
name: Cosine Ndcg@25
- type: cosine_mrr@25
value: 0.2610487357311384
name: Cosine Mrr@25
- type: cosine_map@25
value: 0.2610487357311386
name: Cosine Map@25
- type: dot_accuracy@25
value: 0.42709529276693453
name: Dot Accuracy@25
- type: dot_precision@100
value: 0.007600459242250287
name: Dot Precision@100
- type: dot_precision@200
value: 0.004328358208955224
name: Dot Precision@200
- type: dot_precision@300
value: 0.003076923076923077
name: Dot Precision@300
- type: dot_precision@400
value: 0.002359357060849598
name: Dot Precision@400
- type: dot_precision@500
value: 0.0019219288174512062
name: Dot Precision@500
- type: dot_precision@600
value: 0.0016188289322617683
name: Dot Precision@600
- type: dot_precision@700
value: 0.0013990487124815481
name: Dot Precision@700
- type: dot_precision@800
value: 0.0012299081515499426
name: Dot Precision@800
- type: dot_precision@900
value: 0.0010970787090190078
name: Dot Precision@900
- type: dot_precision@1000
value: 0.0009896670493685423
name: Dot Precision@1000
- type: dot_recall@100
value: 0.7600459242250287
name: Dot Recall@100
- type: dot_recall@200
value: 0.8656716417910447
name: Dot Recall@200
- type: dot_recall@300
value: 0.9230769230769231
name: Dot Recall@300
- type: dot_recall@400
value: 0.9437428243398392
name: Dot Recall@400
- type: dot_recall@500
value: 0.9609644087256027
name: Dot Recall@500
- type: dot_recall@600
value: 0.9712973593570609
name: Dot Recall@600
- type: dot_recall@700
value: 0.9793340987370838
name: Dot Recall@700
- type: dot_recall@800
value: 0.983926521239954
name: Dot Recall@800
- type: dot_recall@900
value: 0.9873708381171068
name: Dot Recall@900
- type: dot_recall@1000
value: 0.9896670493685419
name: Dot Recall@1000
- type: dot_ndcg@25
value: 0.1952544948998545
name: Dot Ndcg@25
- type: dot_mrr@25
value: 0.13285195280982043
name: Dot Mrr@25
- type: dot_map@25
value: 0.13285195280982032
name: Dot Map@25
SentenceTransformer based on dunzhang/stella_en_1.5B_v5
This is a sentence-transformers model finetuned from dunzhang/stella_en_1.5B_v5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: dunzhang/stella_en_1.5B_v5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: Qwen2Model
(1): Pooling({'word_embedding_dimension': 1536, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Dense({'in_features': 1536, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the π€ Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'QuestionSummary: Function Machines\nQuestion: Which of the following pairs of function machines are correct?\nCorrectAnswer: \n\n\\(a \\Rightarrow -5 \\Rightarrow \\times2\\Rightarrow 2(a-5)\\) \nAnswer: \n\n\\(a \\Rightarrow \\times2 \\Rightarrow -5\\Rightarrow 2(a-5)\\) ',
'Does not follow the arrows through a function machine, changes the order of the operations asked.',
'Incorrectly cancels what they believe is a factor in algebraic fractions',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
val
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@25 | 0.6946 |
cosine_precision@100 | 0.0088 |
cosine_precision@200 | 0.0047 |
cosine_precision@300 | 0.0032 |
cosine_precision@400 | 0.0024 |
cosine_precision@500 | 0.002 |
cosine_precision@600 | 0.0016 |
cosine_precision@700 | 0.0014 |
cosine_precision@800 | 0.0012 |
cosine_precision@900 | 0.0011 |
cosine_precision@1000 | 0.001 |
cosine_recall@100 | 0.876 |
cosine_recall@200 | 0.9369 |
cosine_recall@300 | 0.9587 |
cosine_recall@400 | 0.9736 |
cosine_recall@500 | 0.9805 |
cosine_recall@600 | 0.9862 |
cosine_recall@700 | 0.9931 |
cosine_recall@800 | 0.9954 |
cosine_recall@900 | 0.9977 |
cosine_recall@1000 | 0.9977 |
cosine_ndcg@25 | 0.3564 |
cosine_mrr@25 | 0.261 |
cosine_map@25 | 0.261 |
dot_accuracy@25 | 0.4271 |
dot_precision@100 | 0.0076 |
dot_precision@200 | 0.0043 |
dot_precision@300 | 0.0031 |
dot_precision@400 | 0.0024 |
dot_precision@500 | 0.0019 |
dot_precision@600 | 0.0016 |
dot_precision@700 | 0.0014 |
dot_precision@800 | 0.0012 |
dot_precision@900 | 0.0011 |
dot_precision@1000 | 0.001 |
dot_recall@100 | 0.76 |
dot_recall@200 | 0.8657 |
dot_recall@300 | 0.9231 |
dot_recall@400 | 0.9437 |
dot_recall@500 | 0.961 |
dot_recall@600 | 0.9713 |
dot_recall@700 | 0.9793 |
dot_recall@800 | 0.9839 |
dot_recall@900 | 0.9874 |
dot_recall@1000 | 0.9897 |
dot_ndcg@25 | 0.1953 |
dot_mrr@25 | 0.1329 |
dot_map@25 | 0.1329 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 3,999 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 30 tokens
- mean: 87.03 tokens
- max: 363 tokens
- min: 4 tokens
- mean: 13.84 tokens
- max: 42 tokens
- Samples:
- Loss:
CachedMultipleNegativesSymmetricRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim", "mini_batch_size": 1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 1372per_device_eval_batch_size
: 1372learning_rate
: 4e-05num_train_epochs
: 5warmup_ratio
: 0.1save_only_model
: Truebf16
: Trueload_best_model_at_end
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 1372per_device_eval_batch_size
: 1372per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 4e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 5max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Truerestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | val_cosine_map@25 |
---|---|---|---|
0.3333 | 1 | 2.2717 | 0.1775 |
0.6667 | 2 | 2.1785 | 0.2300 |
1.0 | 3 | 1.4112 | 0.2651 |
1.3333 | 4 | 1.1861 | 0.2726 |
1.6667 | 5 | 0.8742 | 0.2813 |
2.0 | 6 | 0.8327 | 0.2818 |
2.3333 | 7 | 0.7626 | 0.2777 |
2.6667 | 8 | 0.5767 | 0.2752 |
3.0 | 9 | 0.493 | 0.2698 |
3.3333 | 10 | 0.5174 | 0.2654 |
3.6667 | 11 | 0.3906 | 0.2655 |
4.0 | 12 | 0.419 | 0.2627 |
4.3333 | 13 | 0.4394 | 0.2625 |
4.6667 | 14 | 0.5449 | 0.2612 |
5.0 | 15 | 0.3731 | 0.2610 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.13
- Sentence Transformers: 3.1.1
- Transformers: 4.45.1
- PyTorch: 2.2.0
- Accelerate: 0.34.2
- Datasets: 3.0.1
- Tokenizers: 0.20.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}