4096 context?

by david565 - opened Jul 23, 2023

Jul 23, 2023

Meta reports that the base models support 4096 context. Is it possible to make GGML models with 4096 context?

llama.cpp:
$ ./main -c 4096 -m /media/data/llama-2-13b.ggmlv3.q6_K.bin
main: warning: base model only supports context sizes no greater than 2048 tokens (4096 specified)

algorithm

Jul 24, 2023

•

edited Jul 24, 2023

Getting the same error.

Looking at this commit: https://huggingface.co./meta-llama/Llama-2-13b-hf/commit/f3b475aaed299d2389525d6ce4e542cc438833a4

"max_position_embeddings": 2048,

3 days ago this was changed to:

"max_position_embeddings": 4096

Edit: oops that's the hf model, so I guess I'm not sure.

TheBloke

Owner Jul 24, 2023

Yeah I need to fix those config.json files and will do it now

But it won't change that warning message, which is currently hardcoded into llama.cpp and can be ignored on models you know have >2048 context:

So to be clear, yes my config.json files are wrong and will be updated, but that in no way affects the GGML models which work fine at 4096 context - or even greater using RoPE scaling. And to be honest it doesn't really affect the GPTQ models either, as the value in config.json is just a default/baseline and most clients let you specify the context value independently.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment