JingweiZuo
commited on
Commit
•
4493883
1
Parent(s):
bb01ae9
Fix: encoder and decoder in tokenizer
Browse filesHi!
When evaluating the RWKV-v5-Eagle-7B-HF model, I found the errors as shown below. This is mostly caused by the tokenizer. In the code, the encoder and decoder are reversed, as discussed here https://huggingface.co./RWKV/v5-Eagle-7B-HF/discussions/9#65d4dc35f9cbfa798c4be4b3
![Evaluation error on RWKV-v5-Eagle-7B-HF](https://cdn-uploads.huggingface.co/production/uploads/6460c3811db65f878513bcaf/NIQHGPQ7lmSRvkrY4SAax.png)
This PR is raised to fix this issue.
Thanks,
tokenization_rwkv_world.py
CHANGED
@@ -106,11 +106,11 @@ class RWKVWorldTokenizer(PreTrainedTokenizer):
|
|
106 |
assert isinstance(x, bytes)
|
107 |
assert len(x) == int(l[l.rindex(" ") :])
|
108 |
sorted += [x]
|
109 |
-
self.encoder[
|
110 |
|
111 |
self.decoder = {}
|
112 |
for k, v in self.encoder.items():
|
113 |
-
self.decoder[v] =
|
114 |
|
115 |
self.trie = TRIE()
|
116 |
for t, i in self.decoder.items():
|
|
|
106 |
assert isinstance(x, bytes)
|
107 |
assert len(x) == int(l[l.rindex(" ") :])
|
108 |
sorted += [x]
|
109 |
+
self.encoder[x] = idx
|
110 |
|
111 |
self.decoder = {}
|
112 |
for k, v in self.encoder.items():
|
113 |
+
self.decoder[v] = k
|
114 |
|
115 |
self.trie = TRIE()
|
116 |
for t, i in self.decoder.items():
|