euclaise
/

ReMask-3B

Text Generation

Inference Endpoints

Model card Files Files and versions Community

euclaise commited on Mar 28, 2024

Commit

bfcd1ce

·

verified ·

1 Parent(s): 7b1caad

Update README.md

Files changed (1) hide show

README.md +0 -2

README.md CHANGED Viewed

@@ -33,8 +33,6 @@ In particular, it might be easy to predict a *reasonable* next token, but much m
 The correct prediction here might be "signs of life.". However, the model might predict "and" rather than "signs", since "and" is *reasonable* in the immediate context - it's gramatically correct, but implies a strange ending to the sentence.
 As a result, the model might end up with something like "The astronomer pointed his telescope at the distant star, hoping to see and hear." - which makes little sense.
----
 SPIN's advantage over SFT likely comes from its partial mitigation of exposure bias.
 SPIN doesn't only train the model to predict the next token accurately, it repeatedly trains the model to identify and fix discrepancies between its generations and the ground-truth.
 In order to do this, the model must implicitly learn to think ahead, as exposure bias is likely what causes many of the discrepancies.

 The correct prediction here might be "signs of life.". However, the model might predict "and" rather than "signs", since "and" is *reasonable* in the immediate context - it's gramatically correct, but implies a strange ending to the sentence.
 As a result, the model might end up with something like "The astronomer pointed his telescope at the distant star, hoping to see and hear." - which makes little sense.
 SPIN's advantage over SFT likely comes from its partial mitigation of exposure bias.
 SPIN doesn't only train the model to predict the next token accurately, it repeatedly trains the model to identify and fix discrepancies between its generations and the ground-truth.
 In order to do this, the model must implicitly learn to think ahead, as exposure bias is likely what causes many of the discrepancies.