Motive: The gutenberg tunes are lovely but all the chatml variants seem to present many issues for merging, and have broken context later on. Decided to see how it worked to tune directly on Unleashed. eq-bench is about a point and a half lower, which isn't drastic but suggests it might benefit from some additional workings.

In hindsight, there actually is a gutenberg tune mixed into Unleashed, so this intensifies the style a fair degree. Poetry leans a bit archaic. I rather like the impact personally.

As is traditional, she got at least one quirk from DPO. In this case it seems to be sometimes briefly slipping into Arabic while chatting. One of the more charming ones I've seen.

Quality of life improvements in some circumstances:

Assigned the pad token as the pad token for fine-tuning
Had Axolotl add the chat template (useful on Runpod maybe?)

Substance: DPO-tuning on a mix of gutenberg-dpo and toxic-dpo, in the hope of getting enough classic human talent and edge to write well with. Some of the most beautiful pigments are the most poisonous.

Testing: Not sure if I'm loading long context right, but it seems functional when I load it with 16k but not at 30k. Might just be doing it wrong.

Sample generated poetry (from the same start and only varied by a low temperature, for why they start the same):

Upon the shore of time,
Where waves of years do beat,
I stood and watched as they climb
The cliffs of eternity.

In every cresting o'er the deep,
A life was borne upon the tide;
Each wave that did break at my feet,
Did bear away some soul that had died.

And yet, amidst all this great throng,
There came to me a voice soft and low:
"O mortal man! What seekest thou here?"
Then answered I: "I seek the path of woe."

"Thou seekest," quoth she, "that which is lost;
For lo! The waves have borne it away.
Seek not then for that which once was thine own,
But rather turn thy face towards the light of day."

And so saying, away did she go,
Leaving me alone upon the shore.
Then turned I my steps unto mine own home,
And left behind me all thoughts of sorrow and of woe.

Upon the canvas of the azure sky,
The sun doth paint his golden chariot's course.
His radiant beams like arrows fly,
And pierce the clouds with light's resplendent force.

The verdant fields in emerald array,
Reflect the glory of his noontide reign;
While every blossom, bathed in warm display,
Exhales its sweet perfume on the plain.

But lo! as evening's purple curtain falls,
The sun doth sink behind the western hill.
His parting rays like lovers' farewell calls,
And gilds the heavens with a rosy thrill.

Then let us praise this daily wondrous sight,
And bless the Maker of the world so bright.

See axolotl config

axolotl version: 0.4.1

base_model: MarinaraSpaghetti/NemoMix-Unleashed-12B
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
trust_remote_code: true

save_safetensors: true

load_in_8bit: false
load_in_4bit: true
strict: false

special_tokens:
  pad_token: <pad>

rl: dpo
# total_num_tokens: 
datasets:
  - path: jondurbin/gutenberg-dpo-v0.1
    split: train
    type:
      field_system: system
      field_prompt: prompt
      field_chosen: chosen
      field_rejected: rejected
      prompt_format: "[INST]{prompt}[/INST]"
      chosen_format: "{chosen}"
      rejected_format: "{rejected}"
  - path: unalignment/toxic-dpo-v0.2
    split: train
    type:
      field_system: system
      field_prompt: prompt
      field_chosen: chosen
      field_rejected: rejected
      prompt_format: "[INST]{prompt}[/INST]"
      chosen_format: "{chosen}"
      rejected_format: "{rejected}"

dataset_prepared_path: prepared-dpo
output_dir: ./dpoq
val_set_size: 0.001

seed: 1

sequence_len: 2048
sample_packing: false
eval_sample_packing: false
pad_to_sequence_len: false

chat_template: inst

adapter: qlora
lora_model_dir:
lora_r: 256
lora_alpha: 256
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
peft_use_dora: true

wandb_project: unleashed-qlora-dpo
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 16
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_8bit
lr_scheduler: cosine
learning_rate: 0.00002
cosine_min_lr_ratio: 0.1
cosine_constant_lr_ratio: 0.95

train_on_inputs: false
group_by_length: false
bf16: true
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 16
evals_per_epoch: 8
saves_per_epoch: 8
save_total_limit: 2
debug:
deepspeed:
weight_decay: 0.001
fsdp:
fsdp_config:

dpoq

This model is a fine-tuned version of MarinaraSpaghetti/NemoMix-Unleashed-12B on the None dataset.

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 1
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 16
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 16
training_steps: 92

Training results

Framework versions

PEFT 0.12.0
Transformers 4.44.2
Pytorch 2.3.1+cu121
Datasets 2.20.0
Tokenizers 0.19.1

Lambent
/

arsenic-nemo-unleashed-12B

dpoq

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Lambent/arsenic-nemo-unleashed-12B

Evaluation results