bartowski/DeepSeek-V2.5-1210-GGUF · 4k context by default?

mclassHF2023

28 days ago

Thanks for the GGUFs as always! Just a question:
The original model has 160k context length. I'm not familiar with all the rope_freq_base and compress_pos_emb settings, but something seems off.
I do believe that it's possible to get more than 4k context out of this, but don't really know 100% how?

mclassHF2023

18 days ago

Sorry for the bump, but I'm still confused by this one...

bartowski

Owner 18 days ago

oh sorry about that, if you look at the original model's config.json file it'll give you some guidance:

https://huggingface.co./deepseek-ai/DeepSeek-V2.5-1210/blob/main/config.json#L39

  "rope_scaling": {
    "beta_fast": 32,
    "beta_slow": 1,
    "factor": 40,
    "mscale": 1.0,
    "mscale_all_dim": 1.0,
    "original_max_position_embeddings": 4096,
    "type": "yarn"
  },

so you'll need to set the ROPE to with yarn

you should be able to see the options if you use --help i think, but also you can find the values in the server README here:

https://github.com/ggerganov/llama.cpp/tree/53ff6b9b9fb25ed0ec0a213e05534fe7c3d0040f/examples/server

should work for llama-cli as well I'm pretty sure, just CTRL+F for 'yarn'

my best guess is you'll want:

--rope-scaling yarn --yarn-orig-ctx 4096 --yarn-attn-factor 40

beta_slow is default 1 and beta_fast is default 32

you might also be able to get away with JUST --rope-scaling yarn, there's an implication that it's able to load from the model's data, but I'm not positive