Text Generation
Transformers
Safetensors
mistral
chat
conversational
text-generation-inference
Inference Endpoints

Context length?

#8
by AIGUYCONTENT - opened

I downloaded this quant last night: https://huggingface.co./BigHuggyD/anthracite-org_magnum-v2-123b_exl2_8.0bpw_h8

I would like to know what is the suggested context length? I currently have it set to 55,000 (a random number).

And this model does not work with cfg-cache and guidance_scale turned on in Oobaboga. According to Mr. Oobabooga himself, he refers to a paper that claims that turning cfg-cache can make the model smarter: https://www.reddit.com/r/Oobabooga/comments/1cf9bso/what_does_guidance_scale_parameter_do/

Considering how quants essentially perform a lobotomy on models....I am hoping to get cfg-cache working with this model.

Anthracite org

we train on 8192 ctx, but you can try more and see if it becomes incoherent; varies by samplers and use-case.

"I am hoping to get cfg-cache working with this model."
hope you get it working! report back if it works.

By the looks of it the model loses it mind after 15k context. I've tried BigHuggyD/anthracite-org_magnum-v2-123b_exl2_8.0bpw_h8 and schnapper79/lumikabra-195B_v0.3-exl2-4.0bpw and with both after 15k it starts to mumble about the most random of things. This is with just normal min_p preset on ooba and dry sampler set to 0.8 and tested with it off as well. I really do hope you guys train on higher context in the future cause I really love your models but 8k ctx is way too low if the original model supports far more.

Lastly if funds are the issue how much are we talking to train on full context instead of 8k if you make a v3 of this or a future mistral model.

In my honest opinion, that seems to be the case with most Mistral models as a whole. vis-à-vis - Nemo claims 128k but only really goes upto 16k, same with the 22B in my experience, anything past that and it's very "eh" in terms of recall.

Sign up or log in to comment