NeverSleep/Llama-3-Lumimaid-70B-v0.1

May 3

It gives very boring sometimes incoherent replies. Actually i think the 8B lumimaid version was more interesting to use. I don't know how this can be. Somehow it's degraded compared to the base Llama 70B model. Or it just doesn't like the chat format. But thanks for the hard work. Hope to see new iterations of it. What are your thoughts on it?

OrangeApples

May 3

•

edited May 3

Just chiming in with my experience of using a Q4_K_S quant of this model. For me, it has consistently good outputs for roleplaying that are coherent, creative, and drive the story forward in interesting ways. This model has been my new go-to 70B since its release.

I'm not discrediting your experience of course, and I hope someone more knowledgeable can help out. What puzzles me is that you had a better experience with Lumimaid 8B. The only thing I can think of is that maybe you used a high bpw quant of the 8B and a really low (1-2bpw) quant of the 70B, thus the degraded performance. Other than that, idk.

MrHillsss

May 3

I'm not sure about this one. It doesn't feel much different then the original 70B, the only difference is that it gives shorter outputs. It's def not much hotter. Is this based on the instruct llama 3 or the base llama 3? Anyway, I still believe in your skills, master Undi. Was thinking about getting an Undi95 tattoo on my heart <3

IkariDev

NeverSleep org May 3

@MrHillsss AHEM
@OrangeApples Thanks for the feedback!
@Szarka Can you please provide us with some more infos?
For example: quant, rp-Format(novel: "say" do, markdown: say *do*, hybrid: "say" *do*), backend, prompt format, language, and so on. Would help us understand the issue and make further iterations of the model better.

morgul

May 3

I've had similar experience to @Szarka , actually. I'm running the q8_0 quant of the 8B and the q4_k_m quant of the 70B. To me, the 70B is pretty bland compared to the 8B, and it's massively slower. I'd say it might understand context a little better, but it's responses ignore basic things (i.e. what the character was last doing, like drinking a soda) a bit more often than the 7B, which has seemed odd.

The only thing I can think of is that I'm running both through LMStudio, and I know way more settings/knobs using Text Completion in SillyTavern with koboldcpp or oogabooga, but on my M1 Mac LMStudio's raw performance is much better and I haven't fought with compile flags, etc, etc to get a good setup for one of the others running. (It's on my todo list, but LMStudio just works with minimal effort.)

I dunno, this isn't the first model people have raved about that my experience has been really lack-luster with. (The 8B model is killing it for me, though.) My only guess is to blame LMStudio, lol.

nitehu

May 3

•

edited May 4

I'm currently testing the Q4_K_M i1 quant with 8k context in the latest kobold (only 0.1 MinP sampling). For me it gives really nostalgic noromaid vibes ^^
Just after a few hours of playing with it, I'm not yet sure if I'd use it as a daily driver, but I find it definitely coherent and creative, and a lot more horny and uncensored than the llama3. Llama3 just played a character too shy, Lumimaid jumped right into it going with the flow of the first message.
So I think it fares well for ERP, but I must still check with other characters to see if it has the prose or brains for other kinds of RP...
Edit: after further testing, this model is really funny and has a great llama3 personality, but just writes too short replies. I tried to tweak everything, but to no avail. Smoothing factor seems to help a bit, but at the same time dumbs down the model...

Szarka

May 3

•

edited May 3

I'm mainly using it for RP chat where the AI replies as the defined character.

Language:

english

Format:

hybrid "say" do

Quant:

IQ3_XSS (can't run higher quant)

Backend:

KoboldCPP

Frontend:

Sillytavern

Prompt format:

Llama 3 prompt with a system prompt that forces hybrid formatting in replies

IkariDev

NeverSleep org May 3

@Szarka ye we do not support hybrid fromat.

Notice to all using llama.cpp, kobold.cpp or anything using gguf quants:
Please upate it to the newest version! I don't know for others but for kobold.cpp it is version 1.64 you need.

Szarka

May 3

Great! Gonna try it with other format then. Thanks

Szarka

May 3

Tried it with markdown. I see only a tiny amount of improvement. Still bland and boring and repetitive compared to the LumiMaid 8B.

Dracones

May 4

I think what would help would be good presets to be using for Silly Tavern or other platforms. I will say with the default Silly Tavern Llama 3 instruct Context and Instruct template this model is very terse with its replies.

Szarka

May 4

I'm using the same preset with the same prompt on 8B lumi and it's a lot better even though it has weaknesses due to 8B

BigHuggyD

May 4

I had a similar experience running the uncompressed on the latest version of transformers. Every response was terse. I thought I did something wrong. Then I realized I would get out what I put in. If I was terse it was terse. I think I had become spoiled by models where all I had to do was say hi and it would respond back filling up my entire token limit. I'm using the same system prompt I created to test all other models (formatted to the model). Perhaps I should go back and make a custom prompt for this one

Szarka

May 4

@BigHuggyD I tried everything i could that worked with other similar models. Let's hope the Next iteration will be better!

BigHuggyD

May 4

@BigHuggyD I tried everything i could that worked with other similar models. Let's hope the Next iteration will be better!

I actually posted my first interaction for my discord friends because I was amused
Me: Hello
AI: Hi
Me: Hi
AI: Hey
Me: Hey
AI: Hello
Me: Are you a parrot?
AI: Sorry, I was just trying to be friendly.

Undi95

NeverSleep org May 4

Hahahaha alright alright there is some issue apparently, 8B got praised but 70B was lacking some love, we didn't had any issue in the testing phase tho, but it can happen! Thanks for your feedback kek

BigHuggyD

May 4

Hahahaha alright alright there is some issue apparently, 8B got praised but 70B was lacking some love, we didn't had any issue in the testing phase tho, but it can happen! Thanks for your feedback kek

Not meant to be a dig! Love your work. I recognize it's 0.1 for a reason.

Undi95

NeverSleep org May 4

Hahahaha alright alright there is some issue apparently, 8B got praised but 70B was lacking some love, we didn't had any issue in the testing phase tho, but it can happen! Thanks for your feedback kek

Not meant to be a dig! Love your work. I recognize it's 0.1 for a reason.

No worry, honest feedback is best feedback!

EloyOn

May 4

•

edited May 4

Just to add to the 8B fans' side xDDD I love it. Great model. It's fun, has a charming personality, the RP is quite good, and the knowledge it shows when asked is vast and detailed.

I didn't test the 70B version though, my pc is not good enough for that.

ryzen88

May 5

I really like the 70B already, Its a bit tuning and tweaking with the samplers but then it give a good output. very promising certainly

morgul

May 6

I've had similar experience to @Szarka , actually. I'm running the q8_0 quant of the 8B and the q4_k_m quant of the 70B. To me, the 70B is pretty bland compared to the 8B, and it's massively slower. I'd say it might understand context a little better, but it's responses ignore basic things (i.e. what the character was last doing, like drinking a soda) a bit more often than the 7B, which has seemed odd.

The only thing I can think of is that I'm running both through LMStudio, and I know way more settings/knobs using Text Completion in SillyTavern with koboldcpp or oogabooga, but on my M1 Mac LMStudio's raw performance is much better and I haven't fought with compile flags, etc, etc to get a good setup for one of the others running. (It's on my todo list, but LMStudio just works with minimal effort.)

I dunno, this isn't the first model people have raved about that my experience has been really lack-luster with. (The 8B model is killing it for me, though.) My only guess is to blame LMStudio, lol.

So, I've been playing around since I said this. Seems like the 70B really benefits from better parameters and the finer control of running it through koboldcpp or ollama. (I've been playing with both.) I think the 8B is 'hotter'; it has a distinctive style that comes out more and it is very fun and playful even 'stock'. But, it does have some of the limitations of an 8B the longer I play with it; initial messages are great, but it can lose the plot with little things like remembering if a character had already opened a can of soda, or having the character repeat an action that I just performed, like opening a door or touching someone's arm. The 70B is less enthusiastic in it's style, but it's a dark horse. 100+ message RPs it seems to handle reasonably fine, long past the point where the 8B has started getting weird or losing the character's personality.

As of now, the 70B (alt) is what I'm running consistently, and it seems to work with all the character cards I have, sticking to things a lot better than the others. I can feel the Noromaid roots; The things I really liked about the Noromaid line seem to shine in this one, too. To borrow a headphone analogy, I'd say the 8B is like a really bass-heavy pair of headphones; initially a lot of people go, "oh, heck yeah!", but you can't really turn that out of them if it's not what you want. The 70B is more flat; tweak your EQ some, and you've got an amazing sound tuned to your preference.

Honestly, I didn't suspect I'd land on the Lumimiaid-70B-alt as my new daily driver... but here we are. I feel bad for calling it 'bland' now; it's more neutral, which is a good thing.

KrimsonWolf

May 8

My experience is the 70B model wasn't very good but the 70B-ALT actually works fine for most of the things I did with it, and is becoming one of my favorite models for it's knowledge of various kinds of things (many of which are NSFW so I won't be more specific)

YokaiKoibito

May 19

•

edited May 19

What parameters have people been using for the 70B Alt model? My experience so far has been to turn Mirostat off and keep the temperature to about 0.8 or lower, since otherwise things get weird fast. But I still definitely have seen issues like it forgetting what a character was just described as wearing, or odd capitalization.

[Running Q4_K_M GGUF in kobold.cpp on Apple Silicon 64GB, 8k context, SillyTavern front end and character cards, Llama3 prompt format, hybrid quoting style.]