I have deleted the model because I found a problem with the tokenizer. Will upload the corrected model in the next few days.
#1
by
OwenArli
- opened
I have deleted the model because I found a problem with the tokenizer. Will upload the corrected model in the next few days.
Sorry for an off topic ping, but are you training this on the base model, or instruct?
Base Qwen 32B worked really well for the EVA finetune, and it seems to retain >64K context without any need for YaRN like the instruct model. I may be too late, but consider finetuning on the base model instead of Qwen Instruct.