Awan LLM commited on
Commit
f555a12
1 Parent(s): 7d5e6fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +145 -0
README.md CHANGED
@@ -1,3 +1,148 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ Based on Meta-Llama-3-8b-Instruct, and is governed by Meta Llama 3 License agreement:
5
+ https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct
6
+
7
+
8
+ We don't know how good this model is exactly in benchmarks since we have not benched this yet, but we think real prompts and usage is more telling anyways.
9
+
10
+
11
+ From our testing this model is:
12
+
13
+ - Less Refusals
14
+ - More Uncensored
15
+ - Follows requests better
16
+ - Can reply in requested formats better without adding unnecesary information
17
+
18
+ We are happy for anyone to try it out and give some feedback.
19
+ You can also try this model on our API at https://www.awanllm.com/
20
+
21
+
22
+ Training:
23
+ - 2048 sequence length, while the base model is 8192 sequence length. From testing it still performs the same 8192 context just fine.
24
+ - Trained on a modified and improved version of Cognitive Computations Eric Hartford's Dolphin dataset. https://huggingface.co/datasets/cognitivecomputations/dolphin
25
+ - Training duration is around 2 days on 2x RTX3090 on our own machine, using 4-bit loading and Qlora 64-rank 128-alpha resulting in ~2% trainable weights.
26
+
27
+
28
+ The goal for this model is to have the model less-censored and great at general tasks like the previous dolphin based models by Eric Hartford.
29
+ We started training this BEFORE they launched their own full weight trained Llama-3-8B-Dolphin-2.9 with their own curated datasets and the newer "Dolphin 2.9" dataset, but we think this model is still a unique take on Llama 3 8B Instruct and the dolphin dataset.
30
+ https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b
31
+
32
+
33
+ The difference with their dolphin 2.9 model is that we train this using Meta's new Llama 3 instruct format and not the regular ChatML format that Dolphin models are usually trained on.
34
+ This is because we think that it performed better using the format it was originally trained on.
35
+
36
+ Instruct format:
37
+ ```
38
+ <|begin_of_text|><|start_header_id|>system<|end_header_id|>
39
+
40
+ {{ system_prompt }}<|eot_id|><|start_header_id|>user<|end_header_id|>
41
+
42
+ {{ user_message_1 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
43
+
44
+ {{ model_answer_1 }}<|eot_id|><|start_header_id|>user<|end_header_id|>
45
+
46
+ {{ user_message_2 }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
47
+ ```
48
+
49
+
50
+ Quants:
51
+
52
+ AWQ: https://huggingface.co/AwanLLM/Meta-Llama-3-8B-Instruct-Dolfin-AWQ
53
+
54
+ GGUF: https://huggingface.co/AwanLLM/Meta-Llama-3-8B-Instruct-Dolfin-v0.1-GGUF
55
+
56
+ FP16: https://huggingface.co/AwanLLM/Meta-Llama-3-8B-Instruct-Dolfin
57
+
58
+ Exllamav2:
59
+
60
+ 4bpw: https://huggingface.co/AwanLLM/Meta-Llama-3-8B-Instruct-Dolfin-v0.1-exl2-h8-4bpw-exl2
61
+
62
+ 8bpw: https://huggingface.co/AwanLLM/Meta-Llama-3-8B-Instruct-Dolfin-v0.1-exl2-h8-8bpw-exl2
63
+
64
+
65
+ [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
66
+
67
+ Axolotl Config:
68
+ ```
69
+ base_model: Meta-Llama-3-8B-Instruct
70
+ model_type: LlamaForCausalLM
71
+ tokenizer_type: AutoTokenizer
72
+
73
+ train_on_inputs: false
74
+ group_by_length: false
75
+ load_in_8bit: false
76
+ load_in_4bit: true
77
+ strict: false
78
+ sequence_len: 2048
79
+ bf16: true
80
+ fp16: false
81
+ tf32: false
82
+ flash_attention: true
83
+
84
+ # Data
85
+ datasets:
86
+ - path: flan1m-universal-uncensored-system-2048.jsonl
87
+ type:
88
+ system_prompt: ""
89
+ system_format: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"
90
+ field_system: system
91
+ field_instruction: input
92
+ field_output: output
93
+ format: "{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
94
+ no_input_format: "{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
95
+
96
+ warmup_steps: 10
97
+ dataset_prepared_path: ./last_run_prepared
98
+
99
+ # Iterations
100
+ num_epochs: 1
101
+ saves_per_epoch: 4
102
+
103
+ # Evaluation
104
+ val_set_size: 0.01
105
+ eval_table_size:
106
+ eval_table_max_new_tokens:
107
+ eval_sample_packing: false
108
+ evals_per_epoch: 4
109
+
110
+ # LoRA
111
+ output_dir: ./qlora-out
112
+ adapter: qlora
113
+ lora_model_dir:
114
+ lora_r: 64
115
+ lora_alpha: 128
116
+ lora_dropout: 0.05
117
+ lora_target_linear: true
118
+ lora_fan_in_fan_out:
119
+ lora_target_modules:
120
+ save_safetensors: true
121
+
122
+ # Sampling
123
+ sample_packing: true
124
+ pad_to_sequence_len: true
125
+
126
+ # Batching
127
+ gradient_accumulation_steps: 32
128
+ micro_batch_size: 4
129
+ gradient_checkpointing: true
130
+ gradient_checkpointing_kwargs:
131
+ use_reentrant: true
132
+
133
+ # Optimizer
134
+ optimizer: paged_adamw_8bit
135
+ lr_scheduler: cosine
136
+ learning_rate: 0.0002
137
+
138
+ # Misc
139
+ early_stopping_patience:
140
+ resume_from_checkpoint:
141
+ logging_steps: 1
142
+ debug:
143
+ deepspeed: zero3_bf16.json
144
+ weight_decay: 0.1
145
+ special_tokens:
146
+ pad_token: <|end_of_text|>
147
+ ```
148
+