Model seems to have a repitition issue.

by ockerman0 - opened Jul 22, 2024

Discussion

ockerman0

Jul 22, 2024

•

edited Jul 22, 2024

I was giving this model a go earlier using one of my own character cards. The earlier messages were all pretty good though I began to notice roughly 20 messages in the model would begin to repeat whole phrases from earlier messages over and over. Using the find feature on my browser I would notice some phrases being repeated up to 5 times.

Unrelated, though something worth mentioning as well is the models tendency to just recite the character's description as opposed to actually formulate a response. I have also observed this happening when trying to generate a summary.

For reference I have tried the static Q8 and Q6_K quants, as well as the F16.Q6 quant. All three of which seem to have the same issue.

AuriAetherwiing

Nothing is Real org Jul 23, 2024

•

edited Jul 23, 2024

Yeah, some repetition is still present in v1.2, we are working on fixing that in the new version. We are filtering out the most repetitive samples from C2 Logs subset used in this model + adding RP formatting on D/WP dataset to, potentially, better mix it with RP data. This also will hopefully will fix the second issue (which happens likely because of some charcards in C2 taking a large part of 8k seq len).
Also we have received reports of model being more repetitive on inference engines other than vLLM and Aphrodite. More specifically, repetition/frequency/presence penalties being less effective than expected.

aaronday3

Nothing is Real org Jul 23, 2024

@ockerman0 try using Aphrodite engine with rep pen of 1.1

aaronday3

Nothing is Real org Jul 23, 2024

A friend of mine found repetition differences between engines and Aphrodite pretty much fixes it with rep pen 1.1

ockerman0

Jul 23, 2024

•

edited Jul 23, 2024

Thanks, I'll have a look and see if I can find any differences. Something I should've mentioned in the first post is that I was using Koboldcpp. Aphrodite doesn't seem to support many AMD graphics cards at the moment, so I'll have to try vLLM.

AuriAetherwiing

Nothing is Real org Jul 23, 2024

@ockerman0 you can also try loading up EXL2 quants on TabbyAPI, EXL2 format seems to be the less problematic one, and Tabby supports ROCm

ockerman0

Jul 23, 2024

•

edited Jul 24, 2024

I've tried a short conversation using the 6.3bpw EXL2 quant with TabbyAPI, it seems to behave mostly the same. Hopefully the new version might do something to alleviate this.

Changing the repitition penalty range seems to help with the issue somewhat, to what extent I'm not sure though.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment