Qubitium commited on
Commit
7cb66c0
1 Parent(s): 34ba11f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -8,7 +8,7 @@ tags:
8
  ## Why should you use this and not the tiktoken included in the orignal model?
9
  1. Original tokenizer pad the vocabulary to correct size with `<extra_N>` tokens but encoder never uses them
10
  2. Original tokenizer use eos as pad token which may confuse trainers to mask out the eos token so model never output eos.
11
- 3. [PEDNING] config.json embedding size of "vocab_size": 100352 does not match 100277
12
 
13
  modified from original code @ https://huggingface.co/Xenova/dbrx-instruct-tokenizer
14
 
 
8
  ## Why should you use this and not the tiktoken included in the orignal model?
9
  1. Original tokenizer pad the vocabulary to correct size with `<extra_N>` tokens but encoder never uses them
10
  2. Original tokenizer use eos as pad token which may confuse trainers to mask out the eos token so model never output eos.
11
+ 3. [NOT FIXED: INVESTIGATING] config.json embedding size of "vocab_size": 100352 does not match 100277
12
 
13
  modified from original code @ https://huggingface.co/Xenova/dbrx-instruct-tokenizer
14