I have deleted the model because I found a problem with the tokenizer. Will upload the corrected model in the next few days.

by OwenArli - opened 6 days ago

6 days ago

I have deleted the model because I found a problem with the tokenizer. Will upload the corrected model in the next few days.

Downtown-Case

3 days ago

@OwenArli

Sorry for an off topic ping, but are you training this on the base model, or instruct?

Base Qwen 32B worked really well for the EVA finetune, and it seems to retain >64K context without any need for YaRN like the instruct model. I may be too late, but consider finetuning on the base model instead of Qwen Instruct.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment