ludis/tsukasa-llama-3-8b-qlora

big thanks to lore for the 8xH100 gpus

training

base model is meta llama 3 8b instruct trained on pippa then i trained that model on limarp, both at 32k context for 2 epochs each

gen settings

i would start with every sampler off and temperature at 1 and just make min p 0.05, i got good prompts from this but u can also try to gen settings from shori which are copy pasted below

Main choice (may have repetition issues)
- Temperature: 1.0; Min-P: 0.05-0.10; Presence Penalty: 0.35-0.45
Alternative 1 (appears to solve repetition issues while being coherent, but reponses might possibly be less truthful)
- Temperature: 2.40-2.50; Min-P: 0.40; Frequency penalty: 0.10-0.15; Temperature last.
Alternative 2
- Mirostat type: 2, Mirostat Tau: 2.80-3.00; Mirostat Eta: 0.0175-0.0200; neutralize or disable all other samplers

prompting

use the llama 3 instruct format

<|eot_id|> as stopping sequence/string/token

ST jsons: instruct context

agnaistic prompt:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>{{#if system}}<|begin_of_text|><|start_header_id|>system<|end_header_id|>{{system}}<|eot_id|>{{/if}}Write {{char}}'s next reply in a fictional roleplay chat between {{#each bot}}{{.name}}, {{/each}}{{char}} and {{user}}.

{{char}}'s Persona: {{personality}}

{{#if memory}}
Important details:
{{memory}}
{{/if}}

{{#if example_dialogue}}This is how {{char}} should talk:
{{example_dialogue}}{{/if}}

This scenario of the conversation: {{scenario}}

Then the roleplay chat between {{#each bot}}{{.name}}, {{/each}}{{char}} and {{user}} begins.<|eot_id|>

{{#each msg}}{{#if .isbot}}<|start_header_id|>response<|end_header_id|>{{/if}}{{#if .isuser}}<|start_header_id|>user<|end_header_id|>{{/if}}{{.name}}: {{.msg}}<|eot_id|>
{{/each}}
{{#if ujb}}<|begin_of_text|><|start_header_id|>system<|end_header_id|>{{ujb}}<|eot_id|>{{/if}}
<|start_header_id|>response<|end_header_id|>{{post}}

ludis
/

tsukasa-llama-3-8b-qlora

training

gen settings

prompting

Model tree for ludis/tsukasa-llama-3-8b-qlora

Datasets used to train ludis/tsukasa-llama-3-8b-qlora

Collection including ludis/tsukasa-llama-3-8b-qlora

llama 3 8b tunes