|
--- |
|
license: apache-2.0 |
|
base_model: OwenArli/ArliAI-Llama-3-8B-Instruct-Dolfin-v0.1 |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# QuantFactory/ArliAI-Llama-3-8B-Instruct-Dolfin-v0.1-GGUF |
|
This is quantized version of [OwenArli/ArliAI-Llama-3-8B-Instruct-Dolfin-v0.1](https://huggingface.co./OwenArli/ArliAI-Llama-3-8B-Instruct-Dolfin-v0.1) created using llama.cpp |
|
|
|
# Model Description |
|
Based on Meta-Llama-3-8b-Instruct, and is governed by Meta Llama 3 License agreement: |
|
https://huggingface.co./meta-llama/Meta-Llama-3-8B-Instruct |
|
|
|
|
|
We don't know how good this model is exactly in benchmarks since we have not benched this yet, but we think real prompts and usage is more telling anyways. |
|
|
|
|
|
From our testing this model is: |
|
|
|
- Less Refusals |
|
- More Uncensored |
|
- Follows requests better |
|
- Can reply in requested formats better without adding unnecesary information |
|
|
|
We are happy for anyone to try it out and give some feedback. |
|
|
|
|
|
Training: |
|
- 2048 sequence length, while the base model is 8192 sequence length. From testing it still performs the same 8192 context just fine. |
|
- Trained on a modified and improved version of Cognitive Computations Eric Hartford's Dolphin dataset. https://huggingface.co./datasets/cognitivecomputations/dolphin |
|
- Training duration is around 2 days on 2x RTX3090 on our own machine, using 4-bit loading and Qlora 64-rank 128-alpha resulting in ~2% trainable weights. |
|
|
|
|
|
The goal for this model is to have the model less-censored and great at general tasks like the previous dolphin based models by Eric Hartford. |
|
We started training this BEFORE they launched their own full weight trained Llama-3-8B-Dolphin-2.9 with their own curated datasets and the newer "Dolphin 2.9" dataset, but we think this model is still a unique take on Llama 3 8B Instruct and the dolphin dataset. |
|
https://huggingface.co./cognitivecomputations/dolphin-2.9-llama3-8b |
|
|
|
|
|
The difference with their dolphin 2.9 model is that we train this using Meta's new Llama 3 instruct format and not the regular ChatML format that Dolphin models are usually trained on. |
|
This is because we think that it performed better using the format it was originally trained on. |
|
|
|
Instruct format: |
|
``` |
|
<|begin_of_text|><|start_header_id|>system<|end_header_id|> |
|
|
|
{{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
|
{{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
|
|
{{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|> |
|
|
|
{{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
|
``` |
|
|
|
|
|
|
|
|
|
[<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl) |
|
|
|
Axolotl Config: |
|
``` |
|
base_model: Meta-Llama-3-8B-Instruct |
|
model_type: LlamaForCausalLM |
|
tokenizer_type: AutoTokenizer |
|
|
|
train_on_inputs: false |
|
group_by_length: false |
|
load_in_8bit: false |
|
load_in_4bit: true |
|
strict: false |
|
sequence_len: 2048 |
|
bf16: true |
|
fp16: false |
|
tf32: false |
|
flash_attention: true |
|
|
|
# Data |
|
datasets: |
|
- path: flan1m-universal-uncensored-system-2048.jsonl |
|
type: |
|
system_prompt: "" |
|
system_format: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n" |
|
field_system: system |
|
field_instruction: input |
|
field_output: output |
|
format: "{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" |
|
no_input_format: "{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" |
|
|
|
warmup_steps: 10 |
|
dataset_prepared_path: ./last_run_prepared |
|
|
|
# Iterations |
|
num_epochs: 1 |
|
saves_per_epoch: 4 |
|
|
|
# Evaluation |
|
val_set_size: 0.01 |
|
eval_table_size: |
|
eval_table_max_new_tokens: |
|
eval_sample_packing: false |
|
evals_per_epoch: 4 |
|
|
|
# LoRA |
|
output_dir: ./qlora-out |
|
adapter: qlora |
|
lora_model_dir: |
|
lora_r: 64 |
|
lora_alpha: 128 |
|
lora_dropout: 0.05 |
|
lora_target_linear: true |
|
lora_fan_in_fan_out: |
|
lora_target_modules: |
|
save_safetensors: true |
|
|
|
# Sampling |
|
sample_packing: true |
|
pad_to_sequence_len: true |
|
|
|
# Batching |
|
gradient_accumulation_steps: 32 |
|
micro_batch_size: 4 |
|
gradient_checkpointing: true |
|
gradient_checkpointing_kwargs: |
|
use_reentrant: true |
|
|
|
# Optimizer |
|
optimizer: paged_adamw_8bit |
|
lr_scheduler: cosine |
|
learning_rate: 0.0002 |
|
|
|
# Misc |
|
early_stopping_patience: |
|
resume_from_checkpoint: |
|
logging_steps: 1 |
|
debug: |
|
deepspeed: zero3_bf16.json |
|
weight_decay: 0.1 |
|
special_tokens: |
|
pad_token: <|end_of_text|> |
|
``` |