Is this model native 128K context length, or YaRN extended?

#28
by danielhanchen - opened

Hi there! Great work as usual!

I'm inquiring if the the model has native 128K context length, or is it YaRN extended.

The readme says:
5. Handle Long Inputs: For inputs exceeding 32,768 tokens, enable YaRN to improve the model's ability to capture long-sequence information effectively.

For supported frameworks, you could add the following to config.json to enable YaRN:

{
  ...,
  "rope_scaling": {
    "factor": 4.0,
    "original_max_position_embeddings": 32768,
    "type": "yarn"
  }
}

also the config.json file got changed:

image.png

But the main readme says "Context Length: Full 131,072 tokens"

Here https://www.reddit.com/r/LocalLLaMA/comments/1j5qo7q/qwq32b_infinite_generations_fixes_best_practices/ it was mentioned "The Qwen team confirmed for long context (128K), you should use YaRN" - so if you set context length beyond 32768, it seems there is a need to edit config.json manually. YaRN should work with EXL2 quants and TabbyAPI, and my guess most other popular backends support it too.

@Lissanro Oh hi that was my post!! :)) I'm still communicating with the Qwen team on how to apply YaRN correctly - weirdly when I did it, it gets worse - so I'm unsure what the correct settings are!

If you find out more, please share! I for now just added the rope_scaling in config.json as your first post suggests. Since max_position_embeddings already set to 131072, I did not change it.

It would be great if they share config.json that already edited as needed to properly add YaRN, maybe name it config_yarn.json so it could be easily renamed to config.json if needed, this would greatly help to reduce possibility of mistakes and ambiguity.

Also, if the model is natively 32768, then perhaps the main config.json should be changed back to 32768 so there is no confusion. Great model in any case, just sharing suggestions to make it easier to use.

I will share whatever I learn from them!

Sign up or log in to comment