Gurveer05
/

gte-base-eedi-2024

@@ -7,117 +7,180 @@ tags:
 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
-- dataset_size:24420
 - loss:MultipleNegativesRankingLoss
 widget:
-- source_sentence: "Construct:  Solve linear equations with the variable appearing\
-    \ on both sides, with all positive integers.\n\nQuestion:  Jo and Paul are arguing\
-    \ about a first step to solve this equation:\n2 v-5=6 v-3\n\nJo says you can write:\
-    \  2 v-8=6 v \n\nPaul says you can write:  2 v=6 v-8 \n\nWho is correct?\n\nOptions:\n\
-    A. Only\nJo\nB. Only\nPaul\nC. Both Jo and Paul\nD. Neither is correct\n\nAnswer:\
-    \ Only\nJo"
-  sentences:
-  - Answers order of operations questions with brackets as if the brackets are not
-    there
-  - Goes the wrong direction in the sequence when identifying term-to-term rule
-  - When solving an equation, uses the same operation rather than the inverse.
-- source_sentence: 'Construct:  Round the elements of a calculation to one significant
-    figure to perform an estimation.
-    Question:  Julie wants to estimate the answer to this calculation by rounding
-    each number to  1  significant figure.
-    [
-    0.5841 x 36.3
-    ]
-    What number should replace  0 . 5 8 4 1  in her estimation?
     Options:
-    A. 1
-    B. 0.5000
-    C. 0.6
-    D. 0.5
-    Answer: 0.5000'
   sentences:
-  - Thinks you need to include the total as well as the individual parts when asked
-    to write a ratio
-  - Rounds down instead of up
-  - When factorising a quadratic without a non variable term, tries to double bracket
-    factorise
-- source_sentence: "Construct:  Represent an equation using a bar model.\n\nQuestion:\
-    \  Tom and Katie are discussing bar models. Tom says this bar model represents\
-    \  5 x  \n  5  |  x  \n\n Katie says this bar model represents  5 x  \n x  | \
-    \ x  |  x  |  x  |  x  \n\n Who do you agree with?\n\nOptions:\nA. Only Tom\n\
-    B. Only Katie\nC. Both Tom and Katie\nD. Neither is correct\n\nAnswer: Neither\
-    \ is correct"
   sentences:
-  - Does not understand that a probability of 1 represents something that is certain
-  - Does not understand bar modelling in algebra
-  - Does not understand that you can have percentages greater than 100%
-- source_sentence: 'Construct:  Find another power of 10 more than a given number.
-    Question:  What number is  10,000  more than  298,603  ?
     Options:
-    A. 398,603
-    B. 2,986,030,000
-    C. 208,603
-    D. 308,603
-    Answer: 208,603'
   sentences:
-  - When solving an equation, uses the same operation rather than the inverse.
-  - Cannot estimate the relative volume order, for different objects
-  - When two digits sum to 10 or more during an addition problem, does not add one
-    to the preceding digit
-- source_sentence: 'Construct:  Interpret a pie chart.
-    Question:  A darts player hits his target  40 %  of the time. Which pie chart
-    represents his hits and misses?
     Options:
-    A. Pie chart showing hits highlighted with 320 degrees of the chart and misses
-    with 40 degrees.
-    B. Pie chart showing hits with 2/3 of the chart and misses, highlighted in red,
-    with 1/3
-    C. Pie chart showing hits, in white, with 320 degrees of the chart and misses,
-    highlighted in red, with 40 degrees.
-    D. Pie chart showing hits with just over 1/3 of the chart, highlighted in red
-    and misses, in white, with just under 2/3
-    Answer: Pie chart showing hits with 2/3 of the chart and misses, highlighted in
-    red, with 1/3'
   sentences:
-  - Answers as if there are 100 minutes in an hour when changing from hours to minutes
-  - Thinks x = y is an axis
-  - Does not know how to read information off a pie chart
 ---
 # SentenceTransformer based on Alibaba-NLP/gte-base-en-v1.5
@@ -170,9 +233,9 @@ from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("Gurveer05/gte-base-eedi-2024")
 # Run inference
 sentences = [
-    'Construct:  Interpret a pie chart.\n\nQuestion:  A darts player hits his target  40 %  of the time. Which pie chart represents his hits and misses?\n\nOptions:\nA. Pie chart showing hits highlighted with 320 degrees of the chart and misses with 40 degrees.\nB. Pie chart showing hits with 2/3 of the chart and misses, highlighted in red, with 1/3\nC. Pie chart showing hits, in white, with 320 degrees of the chart and misses, highlighted in red, with 40 degrees.\nD. Pie chart showing hits with just over 1/3 of the chart, highlighted in red and misses, in white, with just under 2/3\n\nAnswer: Pie chart showing hits with 2/3 of the chart and misses, highlighted in red, with 1/3',
-    'Does not know how to read information off a pie chart',
-    'Thinks x = y is an axis',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
@@ -227,19 +290,19 @@ You can finetune this model on your own dataset.
 #### csv
 * Dataset: csv
-* Size: 24,420 training samples
-* Columns: <code>qa_pair_text</code>, <code>MisconceptionName</code>, and <code>negative</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | qa_pair_text                                                                       | MisconceptionName                                                                | negative                                                                         |
-  |:--------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
-  | type    | string                                                                             | string                                                                           | string                                                                           |
-  | details | <ul><li>min: 35 tokens</li><li>mean: 92.3 tokens</li><li>max: 507 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 15.5 tokens</li><li>max: 36 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 15.1 tokens</li><li>max: 39 tokens</li></ul> |
 * Samples:
-  | qa_pair_text                                                                                                                                                                                                                                                                                               | MisconceptionName                                                                                                | negative                                                                                                                                                     |
-  |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------|
-  | <code>Construct:  Find missing numbers in a fraction multiplication.<br><br>Question:  (3 / 5) x ?=(1 / 2).<br><br>Options:<br>A. (2 / 3)<br>B. (5 / 6)<br>C. (3 / 10)<br>D. There are no values that make this calculation work<br><br>Answer: There are no values that make this calculation work</code> | <code>Does not realise that division can be used to find the missing value in a multiplication problem</code>    | <code>Finds the unit fraction when asked to find a non-unit fraction of an amount</code>                                                                     |
-  | <code>Construct:  Find missing angles in a scalene triangle.<br><br>Question:  What is the size of angle  p  ? A triangle with angles labelled 49 degrees, 51 degrees and p [not to scale].<br><br>Options:<br>A. 51°<br>B. 70°<br>C. 80°<br>D. Not enough information<br><br>Answer: 51°</code>           | <code>Thinks angles which look the same on diagram will be equal</code>                                          | <code>Fails to use the properties of simple shapes to work out missing lengths</code>                                                                        |
-  | <code>Construct:  Simplify an algebraic fraction by factorising the numerator.<br><br>Question:  Simplify the following, if possible:  (m^2+2 m-3 / m-3).<br><br>Options:<br>A. m+1<br>B. m+2<br>C. m-1<br>D. Does not simplify<br><br>Answer: m+2</code>                                                  | <code>Thinks that when you cancel identical terms from the numerator and denominator, they just disappear</code> | <code>Believes you add the whole to the numerators, ignoring denominators, when adding a mixed number to a proper fraction with the same denominator.</code> |
 * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
@@ -253,19 +316,19 @@ You can finetune this model on your own dataset.
 #### csv
 * Dataset: csv
-* Size: 19,280 evaluation samples
-* Columns: <code>qa_pair_text</code>, <code>MisconceptionName</code>, and <code>negative</code>
 * Approximate statistics based on the first 1000 samples:
-  |         | qa_pair_text                                                                        | MisconceptionName                                                                 | negative                                                                         |
-  |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
-  | type    | string                                                                              | string                                                                            | string                                                                           |
-  | details | <ul><li>min: 35 tokens</li><li>mean: 94.39 tokens</li><li>max: 903 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 14.31 tokens</li><li>max: 38 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 14.7 tokens</li><li>max: 39 tokens</li></ul> |
 * Samples:
-  | qa_pair_text                                                                                                                                                                                                                                                                                                                                                          | MisconceptionName                                                                       | negative                                                                                                            |
-  |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------|
-  | <code>Construct:  Convert from hours to minutes.<br><br>Question:  3  hours is the same as ___________ minutes.<br><br>Options:<br>A. 180<br>B. 90<br>C. 30<br>D. 300<br><br>Answer: 30</code>                                                                                                                                                                        | <code>Thinks there are 10 minutes in an hour</code>                                     | <code>Incorrectly converted cm to m</code>                                                                          |
-  | <code>Construct:  Rearrange formulae to change the subject where the subject appears once and two steps are needed.<br><br>Question:  Rearrange the following equation to make  v  the subject<br>F=(v^2 / r).<br><br>Options:<br>A. v=r F^2<br>B. v=square root of ((F / r))<br>C. v=square root of (r F)<br>D. v=r square root of (F)<br><br>Answer: v=r F^2</code> | <code>When solving an equation, uses the same operation rather than the inverse.</code> | <code>Confuses the lines y=x and y=-x</code>                                                                        |
-  | <code>Construct:  Use the order of operations to carry out calculations involving powers.<br><br>Question:  2 x 3^2=.<br><br>Options:<br>A. 18<br>B. 12<br>C. 36<br>D. 11<br><br>Answer: 12</code>                                                                                                                                                                    | <code>Mixes up squaring and multiplying by 2 or doubling</code>                         | <code>Thinks they solve mx + c = a, by substituting in the value of a for x and therefore calculating ma + c</code> |
 * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
@@ -278,15 +341,18 @@ You can finetune this model on your own dataset.
 #### Non-Default Hyperparameters
 - `eval_strategy`: steps
-- `gradient_accumulation_steps`: 128
-- `learning_rate`: 5e-06
 - `weight_decay`: 0.01
 - `num_train_epochs`: 20
-- `lr_scheduler_type`: cosine_with_restarts
 - `lr_scheduler_kwargs`: {'num_cycles': 10}
 - `warmup_ratio`: 0.1
 - `fp16`: True
 - `load_best_model_at_end`: True
 - `batch_sampler`: no_duplicates
 #### All Hyperparameters
@@ -296,14 +362,14 @@ You can finetune this model on your own dataset.
 - `do_predict`: False
 - `eval_strategy`: steps
 - `prediction_loss_only`: True
-- `per_device_train_batch_size`: 8
-- `per_device_eval_batch_size`: 8
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
-- `gradient_accumulation_steps`: 128
 - `eval_accumulation_steps`: None
 - `torch_empty_cache_steps`: None
-- `learning_rate`: 5e-06
 - `weight_decay`: 0.01
 - `adam_beta1`: 0.9
 - `adam_beta2`: 0.999
@@ -311,7 +377,7 @@ You can finetune this model on your own dataset.
 - `max_grad_norm`: 1.0
 - `num_train_epochs`: 20
 - `max_steps`: -1
-- `lr_scheduler_type`: cosine_with_restarts
 - `lr_scheduler_kwargs`: {'num_cycles': 10}
 - `warmup_ratio`: 0.1
 - `warmup_steps`: 0
@@ -376,8 +442,8 @@ You can finetune this model on your own dataset.
 - `hub_strategy`: every_save
 - `hub_private_repo`: False
 - `hub_always_push`: False
-- `gradient_checkpointing`: False
-- `gradient_checkpointing_kwargs`: None
 - `include_inputs_for_metrics`: False
 - `eval_do_concat_batches`: True
 - `fp16_backend`: auto
@@ -407,51 +473,42 @@ You can finetune this model on your own dataset.
 </details>
 ### Training Logs
-| Epoch       | Step    | Training Loss | loss       |
-|:-----------:|:-------:|:-------------:|:----------:|
-| 0.5029      | 6       | 1.77          | -          |
-| 1.0059      | 12      | 1.6522        | 1.7033     |
-| 1.5088      | 18      | 1.425         | -          |
-| 2.0118      | 24      | 1.3033        | 1.4015     |
-| 2.5147      | 30      | 1.1269        | -          |
-| 3.0177      | 36      | 1.0721        | 1.2334     |
-| 3.5206      | 42      | 1.003         | -          |
-| 4.0236      | 48      | 0.9973        | 1.1137     |
-| 4.5265      | 54      | 0.884         | -          |
-| 5.0295      | 60      | 0.8756        | 1.0447     |
-| 5.5324      | 66      | 0.8231        | -          |
-| 6.0354      | 72      | 0.7954        | 0.9564     |
-| 6.5383      | 78      | 0.724         | -          |
-| 7.0413      | 84      | 0.7424        | 0.9185     |
-| 7.5442      | 90      | 0.6681        | -          |
-| 8.0472      | 96      | 0.6536        | 0.8554     |
-| 8.5501      | 102     | 0.6139        | -          |
-| 9.0530      | 108     | 0.618         | 0.8165     |
-| 9.5560      | 114     | 0.5491        | -          |
-| 10.0589     | 120     | 0.5586        | 0.7942     |
-| 10.5619     | 126     | 0.5173        | -          |
-| 11.0648     | 132     | 0.5039        | 0.7495     |
-| 11.5678     | 138     | 0.4599        | -          |
-| 12.0707     | 144     | 0.4761        | 0.7338     |
-| 12.5737     | 150     | 0.4293        | -          |
-| 13.0766     | 156     | 0.4226        | 0.7111     |
-| 13.5796     | 162     | 0.3967        | -          |
-| 14.0825     | 168     | 0.4           | 0.6934     |
-| 14.5855     | 174     | 0.3568        | -          |
-| 15.0884     | 180     | 0.3634        | 0.6852     |
-| 15.5914     | 186     | 0.3412        | -          |
-| 16.0943     | 192     | 0.3374        | 0.6702     |
-| 16.5972     | 198     | 0.3127        | -          |
-| 17.1002     | 204     | 0.3235        | 0.6611     |
-| 17.6031     | 210     | 0.2903        | -          |
-| **18.1061** | **216** | **0.2943**    | **0.6571** |
 * The bold row denotes the saved checkpoint.
 ### Framework Versions
 - Python: 3.10.14
 - Sentence Transformers: 3.1.1
-- Transformers: 4.44.2
 - PyTorch: 2.4.0
 - Accelerate: 0.33.0
 - Datasets: 2.19.2

 - sentence-similarity
 - feature-extraction
 - generated_from_trainer
+- dataset_size:2442
 - loss:MultipleNegativesRankingLoss
 widget:
+- source_sentence: 'Construct:  Interpret sloping linear sections of a displacement-time
+    graph.
+    Question:  This graph shows how far Fido the dog is from his home.
+    What might the negative-sloping section represent? A graph with time (secs) on
+    the horizontal axis and distance (m) on the vertical axis. The graph starts at
+    the origin, travels in a straight line up and right, travels horizontally, then
+    travels in a straight line down and right back to the x-axis.
+    Options:
+    A. Fido is walking home
+    B. Fido has fallen asleep
+    C. Fido is accelerating
+    D. Fido is walking away from home
+    Correct Answer: Fido is walking home
+    Incorrect Answer: Fido is walking away from home
+    Predicted Misconception: Negative slope indicates movement away, not towards a
+    starting point.'
+  sentences:
+  - Does not realise you can use equivalent fractions to break fractions up into smaller
+    divisions
+  - Divides by the order of the root
+  - Believes a downward slope on a distance-time graph means travelling away
+- source_sentence: 'Construct:  Identify reflex angles.
+    Question:  An angle measures  192^degree .
+    This means it is...
     Options:
+    A. Acute
+    B. Obtuse
+    C. Reflex
+    D. A right angle
+    Correct Answer: Reflex
+    Incorrect Answer: Obtuse
+    Predicted Misconception: Believing an angle greater than 180 degrees but less
+    than 360 degrees is obtuse.'
   sentences:
+  - Multiplies rather than divides
+  - Confuses factors and multiples
+  - Does not understand that an obtuse angle is between 90 and 180 degrees
+- source_sentence: 'Construct:  Solve quadratic equations using factorisation in the
+    form (x + a)(x + b).
+    Question:  In which region would  x^2-10 x-25=0  belong? A Venn diagram made up
+    of two overlapping circles. One is labelled ''Factorises'' and the other is labelled
+    ''Has one solution equal to 0''.
+    A is in the ''Factorises'' circle only, B is in the overlap of the two circles,
+    C is in the ''Has one solution equal to 0'' circle only, and D is outside the
+    circles.
+    Options:
+    A. A
+    B. B
+    C. C
+    D. D
+    Correct Answer: D
+    Incorrect Answer: C
+    Predicted Misconception: Equating factorization with having a solution equal to
+    zero.'
   sentences:
+  - Believes all quadratic equations have a solution of 0
+  - Believes order of operations does not affect the answer to a calculation
+  - Does not realise that the sum of the two shorter sides must be greater than the
+    third side for it to be a possible triangle
+- source_sentence: 'Construct:  Factorise a quadratic expression in the form ax² +
+    bx + c where a is prime.
+    Question:  Step 1: Factorise the following expression
+    (
+    3 x^2+5 x+2
+    ).
     Options:
+    A. (3 x+2)(3 x+1)
+    B. (3 x+2)(x+1)
+    C. Cannot be factorised
+    D. (3 x+1)(x+2)
+    Correct Answer: (3 x+2)(x+1)
+    Incorrect Answer: (3 x+2)(3 x+1)
+    Predicted Misconception: Belief that all quadratic expressions with prime coefficients
+    can only be factorized using prime numbers in both factors.'
   sentences:
+  - Does not divide by 2 when calculating the area of a trapezium
+  - Mixes up squaring and multiplying by 2 or doubling
+  - When factorising a quadratic with a non-unit coefficient of x squared, believes
+    that coefficient will be in front of both x terms in the factorised form
+- source_sentence: 'Construct:  Substitute negative integer values into expressions
+    involving no powers or roots.
+    Question:  If  d=-2  what is the value of  10-2 d  ?
     Options:
+    A. -12
+    B. 6
+    C. 14
+    D. 32
+    Correct Answer: 14
+    Incorrect Answer: 6
+    Predicted Misconception: Incorrectly subtracting instead of multiplying the variable.'
   sentences:
+  - Believes multiplying two negatives gives a negative answer
+  - Includes the x variable when giving the equation of a horizontal line
+  - Believes multiplying two negatives gives a negative answer
 ---
 # SentenceTransformer based on Alibaba-NLP/gte-base-en-v1.5
 model = SentenceTransformer("Gurveer05/gte-base-eedi-2024")
 # Run inference
 sentences = [
+    'Construct:  Substitute negative integer values into expressions involving no powers or roots.\n\nQuestion:  If  d=-2  what is the value of  10-2 d  ?\n\nOptions:\nA. -12\nB. 6\nC. 14\nD. 32\n\nCorrect Answer: 14\n\nIncorrect Answer: 6\n\nPredicted Misconception: Incorrectly subtracting instead of multiplying the variable.',
+    'Believes multiplying two negatives gives a negative answer',
+    'Believes multiplying two negatives gives a negative answer',
 ]
 embeddings = model.encode(sentences)
 print(embeddings.shape)
 #### csv
 * Dataset: csv
+* Size: 2,442 training samples
+* Columns: <code>qa_pair_text</code> and <code>MisconceptionName</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | qa_pair_text                                                                         | MisconceptionName                                                                 |
+  |:--------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
+  | type    | string                                                                               | string                                                                            |
+  | details | <ul><li>min: 57 tokens</li><li>mean: 121.87 tokens</li><li>max: 621 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 15.09 tokens</li><li>max: 40 tokens</li></ul> |
 * Samples:
+  | qa_pair_text                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | MisconceptionName                                                                                              |
+  |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------|
+  | <code>Construct:  Solve two-step linear equations, with the variable on one side, with all positive integers.<br><br>Question:  Tom and Katie are discussing how to solve:<br>(<br>(8 x / 5)=40<br>)<br><br>Tom says a correct next line of working could be:  8 x = 8 <br>Katie says a correct next line of working could be:  (x / 5)=5 <br>Who is correct?<br><br>Options:<br>A. Only<br>Tom<br>B. Only Katie<br>C. Both Tom and Katie<br>D. Neither is correct<br><br>Correct Answer: Only Katie<br><br>Incorrect Answer: Neither is correct<br><br>Predicted Misconception: Multiplying both sides by 5 instead of dividing by 8 to isolate x.</code>                                   | <code>When dividing a fraction by an integer, divides both the numerator and denominator by the integer</code> |
+  | <code>Construct:  Rearrange a quadratic equation so that it is in the correct form to be factorised.<br><br>Question:  What would be the most useful first step if we wanted to solve the following quadratic equation?  x^2+7 x = 8.<br><br>Options:<br>A. Divide by  x<br>B. Square root both sides of the equation<br>C. Subtract  7 x  from both sides of the equation<br>D. Subtract  8  from both sides of the equation<br><br>Correct Answer: Subtract  8  from both sides of the equation<br><br>Incorrect Answer: Subtract  7 x  from both sides of the equation<br><br>Predicted Misconception: Subtracting terms incorrectly from both sides to isolate the constant term.</code> | <code>Does not realise a quadratic must be in the form ax^2+bx+c=0 to be factorised</code>                     |
+  | <code>Construct:  Multiply proper fractions in the form: Fraction × Integer.<br><br>Question:  (1 / 2) x 3=.<br><br>Options:<br>A. (3 / 2)<br>B. 3 (1 / 2)<br>C. (3 / 6)<br>D. (1 / 6)<br><br>Correct Answer: (3 / 2)<br><br>Incorrect Answer: (3 / 6)<br><br>Predicted Misconception: Multiplying a fraction by an integer results in a larger fraction, not a smaller one.</code>                                                                                                                                                                                                                                                                                                          | <code>When multiplying fractions, multiplies both the numerator and denominator</code>                         |
 * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
 #### csv
 * Dataset: csv
+* Size: 1,928 evaluation samples
+* Columns: <code>qa_pair_text</code> and <code>MisconceptionName</code>
 * Approximate statistics based on the first 1000 samples:
+  |         | qa_pair_text                                                                          | MisconceptionName                                                                 |
+  |:--------|:--------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
+  | type    | string                                                                                | string                                                                            |
+  | details | <ul><li>min: 52 tokens</li><li>mean: 125.81 tokens</li><li>max: 1093 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 14.45 tokens</li><li>max: 39 tokens</li></ul> |
 * Samples:
+  | qa_pair_text                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | MisconceptionName                                                               |
+  |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------|
+  | <code>Construct:  Calculate the square root of a number.<br><br>Question:  What is the square root of nine?<br><br>Options:<br>A. 3<br>B. 4.5<br>C. 81<br>D. 18<br><br>Correct Answer: 3<br><br>Incorrect Answer: 4.5<br><br>Predicted Misconception: Believing square roots result in non-integer values when the original number is a perfect square.</code>                                                                                                                                                                                                                                                                                         | <code>Halves when asked to find the square root</code>                          |
+  | <code>Construct:  Read values off a real life graph.<br><br>Question:  A linear graph showing that 10 miles = £8. The graph can be used to work out how much Kay's company pays her for travel.<br><br>Kay's company paid her  £ 80 <br><br>How many miles did she travel?<br><br>Options:<br>A. 96<br>B. 100<br>C. 64<br>D. 80<br><br>Correct Answer: 100<br><br>Incorrect Answer: 80<br><br>Predicted Misconception: Assuming a direct correlation without calculating the proportional relationship between miles and cost.</code>                                                                                                                  | <code>Believes direct proportion means a 1:1 ratio</code>                       |
+  | <code>Construct:  Calculate compound area involving just rectangles and squares, where the dimensions are given in the same units.<br><br>Question:  What is the area of this compound shape made with rectangles? Compound shape made of two rectangles with the sides labelled 15 cm, 12 cm, 7 cm and 7 cm. Two sides are unlabelled.<br><br>Options:<br>A. 124 cm^2<br>B. 154 cm^2<br>C. 180 cm^2<br>D. 189 cm^2<br><br>Correct Answer: 124 cm^2<br><br>Incorrect Answer: 154 cm^2<br><br>Predicted Misconception: Incorrectly calculating the area of one rectangle and adding it to the other without considering the overlapping section.</code> | <code>Makes an assumption about line segments being equal within a shape</code> |
 * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
   ```json
   {
 #### Non-Default Hyperparameters
 - `eval_strategy`: steps
+- `per_device_train_batch_size`: 64
+- `per_device_eval_batch_size`: 64
+- `gradient_accumulation_steps`: 8
 - `weight_decay`: 0.01
 - `num_train_epochs`: 20
+- `lr_scheduler_type`: cosine
 - `lr_scheduler_kwargs`: {'num_cycles': 10}
 - `warmup_ratio`: 0.1
 - `fp16`: True
 - `load_best_model_at_end`: True
+- `gradient_checkpointing`: True
+- `gradient_checkpointing_kwargs`: {'use_reentrant': False}
 - `batch_sampler`: no_duplicates
 #### All Hyperparameters
 - `do_predict`: False
 - `eval_strategy`: steps
 - `prediction_loss_only`: True
+- `per_device_train_batch_size`: 64
+- `per_device_eval_batch_size`: 64
 - `per_gpu_train_batch_size`: None
 - `per_gpu_eval_batch_size`: None
+- `gradient_accumulation_steps`: 8
 - `eval_accumulation_steps`: None
 - `torch_empty_cache_steps`: None
+- `learning_rate`: 5e-05
 - `weight_decay`: 0.01
 - `adam_beta1`: 0.9
 - `adam_beta2`: 0.999
 - `max_grad_norm`: 1.0
 - `num_train_epochs`: 20
 - `max_steps`: -1
+- `lr_scheduler_type`: cosine
 - `lr_scheduler_kwargs`: {'num_cycles': 10}
 - `warmup_ratio`: 0.1
 - `warmup_steps`: 0
 - `hub_strategy`: every_save
 - `hub_private_repo`: False
 - `hub_always_push`: False
+- `gradient_checkpointing`: True
+- `gradient_checkpointing_kwargs`: {'use_reentrant': False}
 - `include_inputs_for_metrics`: False
 - `eval_do_concat_batches`: True
 - `fp16_backend`: auto
 </details>
 ### Training Logs
+| Epoch    | Step   | Training Loss | loss       |
+|:--------:|:------:|:-------------:|:----------:|
+| 0.8      | 2      | 2.6135        | -          |
+| 1.2      | 3      | -             | 1.0560     |
+| 1.6      | 4      | 2.058         | -          |
+| 2.4      | 6      | 1.7173        | 0.8711     |
+| 3.2      | 8      | 1.5537        | -          |
+| 3.6      | 9      | -             | 0.7901     |
+| 4.0      | 10     | 1.4489        | -          |
+| 4.8      | 12     | 1.4622        | 0.7437     |
+| 5.6      | 14     | 1.2437        | -          |
+| 6.0      | 15     | -             | 0.7079     |
+| 6.4      | 16     | 1.1761        | -          |
+| 7.2      | 18     | 1.0282        | 0.6748     |
+| 8.0      | 20     | 0.9983        | -          |
+| 8.4      | 21     | -             | 0.6437     |
+| 8.8      | 22     | 0.9676        | -          |
+| 9.6      | 24     | 0.8342        | 0.6169     |
+| 10.4     | 26     | 0.7937        | -          |
+| 10.8     | 27     | -             | 0.5950     |
+| 11.2     | 28     | 0.6869        | -          |
+| 12.0     | 30     | 0.6558        | 0.5807     |
+| 12.8     | 32     | 0.6286        | -          |
+| 13.2     | 33     | -             | 0.5732     |
+| 13.6     | 34     | 0.5468        | -          |
+| **14.4** | **36** | **0.4923**    | **0.5694** |
+| 15.2     | 38     | 0.4477        | -          |
+| 15.6     | 39     | -             | 0.5727     |
+| 16.0     | 40     | 0.4108        | -          |
 * The bold row denotes the saved checkpoint.
 ### Framework Versions
 - Python: 3.10.14
 - Sentence Transformers: 3.1.1
+- Transformers: 4.44.0
 - PyTorch: 2.4.0
 - Accelerate: 0.33.0
 - Datasets: 2.19.2

config.json CHANGED Viewed

@@ -36,7 +36,7 @@
   },
   "rope_theta": 500000,
   "torch_dtype": "float32",
-  "transformers_version": "4.44.2",
   "type_vocab_size": 0,
   "unpad_inputs": false,
   "use_memory_efficient_attention": false,

   },
   "rope_theta": 500000,
   "torch_dtype": "float32",
+  "transformers_version": "4.44.0",
   "type_vocab_size": 0,
   "unpad_inputs": false,
   "use_memory_efficient_attention": false,

config_sentence_transformers.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "__version__": {
     "sentence_transformers": "3.1.1",
-    "transformers": "4.44.2",
     "pytorch": "2.4.0"
   },
   "prompts": {},

 {
   "__version__": {
     "sentence_transformers": "3.1.1",
+    "transformers": "4.44.0",
     "pytorch": "2.4.0"
   },
   "prompts": {},

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:37c8c47929fe161302244502858129819816a608e0b3f1baa92877d63de1cde0
 size 547119128

 version https://git-lfs.github.com/spec/v1
+oid sha256:665965611c704f9f1f96ba10bbd209db8453d86adc1022de8aa4c91eb0e2eaca
 size 547119128