Relative success?
I think this model has achieved pretty well what you were aiming for. Conversations are in fact more fluid, less sloppy, characters are more emotional and give you some great reactions\responses to work with, it has some nice depth\nuance, it understands undertones \ reads the room and has some general understanding of what's contextually appropriate and possible. Can do and understand irony\sarcasm, can be evasive, can detect insinuation and answer to that instead of what was literally spoken. Doesn't shy away from being rude in some cases, characters don't behave one-dimensionally. It can handle some esoteric and intense characters well too; it tends to play them more grounded compared to many other models that instead go over the top with flamboyancy\ostentatiousness. Can reference relevant details from earlier messages and reach conclusions over between-the-lines stuff that forms in linkage between two different messages even with quite a few others in-between (that's wild).
A couple of issues. Narration capabilities are limited, so it's not a good fit for adventures. Sometimes struggles staying coherent, especially when there are a lot of moving elements or independent agents. Can also randomly become incoherent in just about any circumstance but in those dire cases regenerating the response mostly helps. And like all Nemo models it can be a bit too agreeable to jump into erotic stuff. One other thing is, sometimes the responses are too short. Which is better than too long, certainly, but it gets dry fast when you let it fall into this pattern. That's even with rep pen settings disabled! I'm talking like 30-token responses. It gave the best stuff when it wrote around 100t~300t, i wish it would steer itself to stay in that region by default. "Continue the last message" in SillyTavern helps, it's just not ideal compared to if it wasn't so often needed.
Overall, despite the flaws, i think there's something special in here. It's already nice enough to enjoy conversations and basic RP with some cards as is, just need to get patient with it with rerolls.
If you keep developing it to be even more human-like, i think it could benefit a lot from some in-character refusal examples; it already knows how to get mad at user and be rude\passive aggressive, disapproving, now it just needs to know how to be disagreeable and shut the user down when appropriate. Also, what do you think of this - (jondurbin/truthy-dpo-v0.1)?
Used Q8 quant. ChatML with disabled system prompt. Samplers: temp 0.7, min_p 0.05, top_A 0.2, smoothing factor 0.2.
Hi there, thank you for the feedback. In general I agree with everything you've mentioned. Currently responses trend short and I'm trying to fix that. Its not an exact science. As for adventure RP, can you provide some info on your system prompts or character cards used? I can work with them and include it in the RL dataset.