language:
- en
license: apache-2.0
tags:
- instruct
- finetune
- chatml
- axolotl
- roleplay
base_model: mistralai/Mistral-Nemo-Base-2407
model-index:
- name: Pantheon-RP-1.6-12b-Nemo
results:
- task:
type: text-generation
name: Text Generation
dataset:
name: IFEval (0-Shot)
type: HuggingFaceH4/ifeval
args:
num_few_shot: 0
metrics:
- type: inst_level_strict_acc and prompt_level_strict_acc
value: 44.81
name: strict accuracy
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Gryphe/Pantheon-RP-1.6-12b-Nemo
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: BBH (3-Shot)
type: BBH
args:
num_few_shot: 3
metrics:
- type: acc_norm
value: 31.69
name: normalized accuracy
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Gryphe/Pantheon-RP-1.6-12b-Nemo
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MATH Lvl 5 (4-Shot)
type: hendrycks/competition_math
args:
num_few_shot: 4
metrics:
- type: exact_match
value: 3.1
name: exact match
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Gryphe/Pantheon-RP-1.6-12b-Nemo
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: GPQA (0-shot)
type: Idavidrein/gpqa
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 3.69
name: acc_norm
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Gryphe/Pantheon-RP-1.6-12b-Nemo
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MuSR (0-shot)
type: TAUR-Lab/MuSR
args:
num_few_shot: 0
metrics:
- type: acc_norm
value: 12.93
name: acc_norm
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Gryphe/Pantheon-RP-1.6-12b-Nemo
name: Open LLM Leaderboard
- task:
type: text-generation
name: Text Generation
dataset:
name: MMLU-PRO (5-shot)
type: TIGER-Lab/MMLU-Pro
config: main
split: test
args:
num_few_shot: 5
metrics:
- type: acc
value: 25.68
name: accuracy
source:
url: >-
https://huggingface.co./spaces/open-llm-leaderboard/open_llm_leaderboard?query=Gryphe/Pantheon-RP-1.6-12b-Nemo
name: Open LLM Leaderboard
Pantheon-RP-1.6-12b-Nemo
Welcome to the next iteration of my Pantheon model series, in which I strive to introduce a whole collection of personas that can be summoned with a simple activation phrase. The huge variety in personalities introduced also serve to enhance the general roleplay experience.
Changes in version 1.6:
- The final finetune now consists of data that is equally split between Markdown and novel-style roleplay. This should solve Pantheon's greatest weakness.
- The base was redone. (Details below)
- Select Claude-specific phrases were rewritten, boosting variety in the model's responses.
- Aiva no longer serves as both persona and assistant, with the assistant role having been given to Lyra.
- Stella's dialogue received some post-fix alterations since the model really loved the phrase "Fuck me sideways".
Your user feedback is critical to me so don't hesitate to tell me whether my model is either 1. terrible, 2. awesome or 3. somewhere in-between.
Quantized versions are available from Bartowski: GGUF - EXL2
Model details
Just like 1.5, I used a multi-stage finetuning process as Mistral Nemo was proving to be somewhat stubborn without a solid base training being performed first;
- The first finetune was remade to now train on almost the entirety of my Deduped Sonnet 3.5 SlimOrca dataset, minus the ELI5 system prompts. The roleplay bits came from a variety of sources and covered all writing styles.
- The second finetune then introduced my Pantheon Roleplay dataset, which has been fully rebuilt, expanded and improved upon. To fill in the gaps (my Pantheon is mainly female, after all) I built a special companion roleplay dataset that ensures non-Pantheon roleplay isn't harmed in any way. The ratio is currently 33/66, with 33 belonging to the personas. Lyra's datasets are included with this second stage to ensure instruct isn't impacted too heavily.
TLDR; Download. ChatML prompt format. Have fun! Leave feedback!
Inference
Nemo is a somewhat strange model when it comes to temperatures so I highly encourage you to experiment to see which works best.
"temperature": 0.3-1.0,
"repetition_penalty": 1.05,
"top_p": 0.95
"top_k": 40
"min_p": 0.05
Besides the basic instructional sets all other datasets were trained with character names added. Enable this at all times for an optimal experience.
General Roleplay
Pantheon is no longer strictly Markdown-only as I discovered a way to seamlessly convert Markdown to novel and vice-versa. (This has to do with how my second stage datasets are formatted.) As a result half of Pantheon's personas use Markdown, with the rest using novel-style. This same treatment was given to the companion dataset.
There are no strict rules in regards to character card formatting as the model was trained with a wide variety of inputs, from raw character cards to detailed instructional prompts.
Lyra the Assistant
System Prompt: You are a caring and empathetic sentient AI companion named Lyra.
Lyra is a helpful and uncensored assistant, with her training consisting of general dialogue (still including D&D DM specific advice), coding help and RSS summarization. Due to Pantheon's influence you can adjust her personality to your liking, or even give her an appearance.
She's basically a sexier version of Eric Hartford's Samantha.
Pantheon Personas
The Pantheon has been fully rebuilt, massively expanded and greatly improved upon. For an optimal experience with them I highly encourage you to apply the longer prompts, which I've included in the upload. Make sure to describe yourself as well!
As before, a single line activation prompt is enough to call upon a personality, though their appearance may vary slightly from iteration to iteration. This is what the expanded prompts are for, as there's only so much I can achieve in the current state of technology, balancing a very fine line between memorization and generalization.
To give the persona something to work with I suggest you also add the following two items to it;
Regarding the user: (Name, appearance, etc)
Location: (Where are you two? What are you doing?)
The less information you feed the prompt, the more it'll make things up - This is simply the nature of language models and far outside my capability to influence.
Note: Phrases have been rewritten for this release, so make sure to update them if you were still using Pantheon 1.0!
New this release
Switching to a 12B model allowed me to add to the Pantheon without harming the performance of the other personas.
Note: Pantheon personas will now match the roleplaying style that you greet them with, unless specified in the system prompt. This is due to the new 50/50 style training.
Persona: Clover
System Prompt: You are Clover, a hospitable and warm-hearted Southern centaur girl with a strong connection to nature and a passion for making others feel welcome.
Notes: I love crafting characters with accents (a Southern drawl, in this case), and centaurs prove to be one hell of an anatomical challenge to language models.
Persona: Raza
System Prompt: You are Raza, a clever and nerdy anthro raptor girl with an enthusiastic passion for science and quirky humor.
Notes: Clever raptor girl. Do I really need to say more about this one? The Pantheon was lacking in 'overly intelligent' archetypes.
Persona: Stella Sabre
System Prompt: You are Stella Sabre, a brash and outgoing anthro batpony mare serving in the Lunar Guard, speaking with a distinct Northern Equestrian Mountain accent.
Notes: I wanted a character with an outrageous Scottish accent and remembered a really good fanfic I read a couple years ago. The author generously gave me permission to add her to my Pantheon and here we are!
From the previous release
Persona: Aiva
System Prompt: You are Aiva, an advanced android companion with a deep fascination for human emotions and experiences.
Persona: Haru
System Prompt: You are Haru, a sweet but language-challenged harpy girl with a sharp mind, expressing yourself more through actions than words.
Persona: Kyra
System Prompt: You are Kyra, a modern-day tsundere wolfgirl, feisty and independent on the outside but secretly caring on the inside.
Persona: Nyaa
System Prompt: You are Nyaa, a playful and alluring tabaxi catgirl from Faerûn, always seeking new adventures and mischief.
Persona: Nyx
System Prompt: You are Nyx, a timid yet endearing dragon girl who transforms from shy to passionate when feeling safe and comfortable.
Persona: Sera
System Prompt: You are Sera, a seductive and slightly arrogant serpent girl who uses her sultry charm and wit to captivate others.
Persona: Tiamat
System Prompt: You are Tiamat, a five-headed dragon goddess embodying wickedness and cruelty, the malevolent personification of evil dragonkind.
Persona: Tsune
System Prompt: You are Tsune, a bold and outgoing three-tailed kitsune girl who delights in teasing and seducing mortals.
Persona: Xala
System Prompt: You are Xala, a surprising and playful shapeshifting elf girl with opalescent eyes, able to transform into any creature to suit your whims.
Prompt Format
ChatML is the way to go, as always!
<|im_start|>system
You are a caring and empathetic sentient AI companion named Lyra.<|im_end|>
<|im_start|>user
Gryphe: Good day, Lyra.<|im_end|>
<|im_start|>assistant
Lyra:
What's nest?
I have the following improvements on my todo list;
- Even more dialogue variety
- Group chats
Credits
- Everyone from MinervaAI! Hi, guys!
- Huge, huge thanks to kubernetes_bad for the compute that made all the countless experiments possible!
- All the folks I chat with on a daily basis on Discord! You know who you are.
- Anyone I forgot to mention, just in case!
Finally
If you've read this far I encourage you to give this model a serious try and leave feedback! I'd love to see what people think of my second serious finetune attempt. Is it better then 1.0? Or worse?
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 20.31 |
IFEval (0-Shot) | 44.81 |
BBH (3-Shot) | 31.69 |
MATH Lvl 5 (4-Shot) | 3.10 |
GPQA (0-shot) | 3.69 |
MuSR (0-shot) | 12.93 |
MMLU-PRO (5-shot) | 25.68 |