Update README.md
Browse files
README.md
CHANGED
@@ -33,8 +33,6 @@ In particular, it might be easy to predict a *reasonable* next token, but much m
|
|
33 |
The correct prediction here might be "signs of life.". However, the model might predict "and" rather than "signs", since "and" is *reasonable* in the immediate context - it's gramatically correct, but implies a strange ending to the sentence.
|
34 |
As a result, the model might end up with something like "The astronomer pointed his telescope at the distant star, hoping to see and hear." - which makes little sense.
|
35 |
|
36 |
-
---
|
37 |
-
|
38 |
SPIN's advantage over SFT likely comes from its partial mitigation of exposure bias.
|
39 |
SPIN doesn't only train the model to predict the next token accurately, it repeatedly trains the model to identify and fix discrepancies between its generations and the ground-truth.
|
40 |
In order to do this, the model must implicitly learn to think ahead, as exposure bias is likely what causes many of the discrepancies.
|
|
|
33 |
The correct prediction here might be "signs of life.". However, the model might predict "and" rather than "signs", since "and" is *reasonable* in the immediate context - it's gramatically correct, but implies a strange ending to the sentence.
|
34 |
As a result, the model might end up with something like "The astronomer pointed his telescope at the distant star, hoping to see and hear." - which makes little sense.
|
35 |
|
|
|
|
|
36 |
SPIN's advantage over SFT likely comes from its partial mitigation of exposure bias.
|
37 |
SPIN doesn't only train the model to predict the next token accurately, it repeatedly trains the model to identify and fix discrepancies between its generations and the ground-truth.
|
38 |
In order to do this, the model must implicitly learn to think ahead, as exposure bias is likely what causes many of the discrepancies.
|