I merged Aurelian with itself using mergekit, creating this EXTENDED LENGTH FRANKENSTEIN.

Does it work

Yes, at 17k it stays coherent, but starts to lose minor details of the story. Not sure how well it performs at 32k though. Quants have a sinificant impact on quality for this model, going from Q6_K to Q5_K had a noticeable drop in quality.

Is it worth it

Maybe? Depends? Do you hate mixtral? Do you have good hardware/patience? Do you need a somewhat smart model with 32k context?

Known issues

VERY strict adherence to prompt format, forgetfullness, strong roleplay bias.

Personal opinion

Dumber than Goliath, but has much less GPTism. If you want 32k goliath, maybe try Goliath-longLORA-120b-rope8-32k-fp16.

Prompt format

Same as Aurelian 0.5.

[INST] <<SYS>>
System prompt, default is: An interaction between a user providing instructions, and an imaginative assistant providing responses.
<</SYS>>
</s><s>[INST] {Put your input text here.}
[/INST] {Model output}

This model doesn't like it too much when you change the prompt format, so even keeping that </s><s> is important.

Benchmarks

NeoEvalPlusN_benchmark

My meme benchmark.

Test name Aurelian DoubleGold
B 1 1
C 1 1
D 0 2
S 2.5 3.25
P 2.25 1.5
Total 6.75 8.75
Downloads last month
7
Safetensors
Model size
124B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ChuckMcSneed/DoubleGold-v0.5-123b-32k

Quantizations
2 models