Milan Straka
commited on
Commit
•
20ab64f
1
Parent(s):
41cfe08
Describe that we dropped the pooler in v1.1.
Browse files
README.md
CHANGED
@@ -14,9 +14,10 @@ tags:
|
|
14 |
## Version History
|
15 |
|
16 |
- **version 1.1**: Version 1.1 was released in Jan 2024, with a change to the
|
17 |
-
tokenizer; the model parameters were mostly kept the same, but
|
18 |
-
were enlarged (by copying suitable rows) to correspond to
|
19 |
-
tokenizer
|
|
|
20 |
|
21 |
The tokenizer in the initial release (a) contained a hole (51959 did not
|
22 |
correspond to any token), and (b) mapped several tokens (unseen during training
|
@@ -29,8 +30,9 @@ tags:
|
|
29 |
mapping all tokens to a unique ID. That also required increasing the
|
30 |
vocabulary size and embeddings weights (by replicating the embedding of the
|
31 |
`[UNK]` token). Without finetuning, version 1.1 and version 1.0 gives exactly
|
32 |
-
the same
|
33 |
-
|
|
|
34 |
|
35 |
However, the sizes of the embeddings (and LM head weights and biases) are
|
36 |
different, so the weights of the version 1.1 are not compatible with the
|
|
|
14 |
## Version History
|
15 |
|
16 |
- **version 1.1**: Version 1.1 was released in Jan 2024, with a change to the
|
17 |
+
tokenizer described below; the model parameters were mostly kept the same, but
|
18 |
+
(a) the embeddings were enlarged (by copying suitable rows) to correspond to
|
19 |
+
the updated tokenizer, (b) the pooler was dropped (originally it was only
|
20 |
+
randomly initialized).
|
21 |
|
22 |
The tokenizer in the initial release (a) contained a hole (51959 did not
|
23 |
correspond to any token), and (b) mapped several tokens (unseen during training
|
|
|
30 |
mapping all tokens to a unique ID. That also required increasing the
|
31 |
vocabulary size and embeddings weights (by replicating the embedding of the
|
32 |
`[UNK]` token). Without finetuning, version 1.1 and version 1.0 gives exactly
|
33 |
+
the same embeddings on any input (apart from the pooler missing in v1.1),
|
34 |
+
and the tokens in version 1.0 that mapped to a different ID than the `[UNK]`
|
35 |
+
token map to the same ID in version 1.1.
|
36 |
|
37 |
However, the sizes of the embeddings (and LM head weights and biases) are
|
38 |
different, so the weights of the version 1.1 are not compatible with the
|