Delta-Vector commited on
Commit
12e309a
·
verified ·
1 Parent(s): 4d4a5c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -169
README.md CHANGED
@@ -1,169 +0,0 @@
1
- ---
2
- library_name: peft
3
- tags:
4
- - generated_from_trainer
5
- datasets:
6
- - PocketDoc/Dans-MemoryCore-CoreCurriculum-Small
7
- - Nitral-AI/ARES-ShareGPT
8
- - Gryphe/Sonnet3.5-SlimOrcaDedupCleaned-20k
9
- - NewEden/Claude-Instruct-2.7K
10
- - NewEden/Claude-Instruct-5K
11
- base_model: NewEden_Phi-PT-merged-LIT
12
- model-index:
13
- - name: phi4-inst-out-r2
14
- results: []
15
- ---
16
-
17
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
18
- should probably proofread and complete it, then remove this comment. -->
19
-
20
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
21
- <details><summary>See axolotl config</summary>
22
-
23
- axolotl version: `0.6.0`
24
- ```yaml
25
- base_model: NewEden_Phi-PT-merged-LIT
26
- model_type: AutoModelForCausalLM
27
- tokenizer_type: AutoTokenizer
28
-
29
- plugins:
30
- - axolotl.integrations.liger.LigerPlugin
31
- liger_rope: true
32
- liger_rms_norm: true
33
- liger_swiglu: true
34
- liger_fused_linear_cross_entropy: true
35
-
36
-
37
- load_in_8bit: false
38
- load_in_4bit: false
39
- strict: false
40
-
41
- datasets:
42
- - path: PocketDoc/Dans-MemoryCore-CoreCurriculum-Small
43
- type: dan-chat-advanced
44
- - path: Nitral-AI/ARES-ShareGPT
45
- type: dan-chat-advanced
46
- - path: Gryphe/Sonnet3.5-SlimOrcaDedupCleaned-20k
47
- type: dan-chat-advanced
48
- - path: NewEden/Claude-Instruct-2.7K
49
- type: dan-chat-advanced
50
- - path: NewEden/Claude-Instruct-5K
51
- type: dan-chat-advanced
52
-
53
- shuffle_merged_datasets: true
54
- dataset_prepared_path: prepared_data
55
- val_set_size: 0.0
56
- output_dir: ./phi4-inst-out-r2
57
-
58
- sequence_len: 16384
59
- sample_packing: true
60
- pad_to_sequence_len: true
61
-
62
- adapter: lora
63
- lora_model_dir:
64
- lora_r: 128
65
- lora_alpha: 16
66
- lora_dropout: 0.05
67
- lora_target_modules:
68
- - gate_proj
69
- - down_proj
70
- - up_proj
71
- - q_proj
72
- - v_proj
73
- - k_proj
74
- - o_proj
75
-
76
- lora_modules_to_save:
77
- - embed_tokens
78
- - lm_head
79
-
80
-
81
- wandb_project: mag-phi
82
- wandb_entity:
83
- wandb_watch:
84
- wandb_name: inst-attempt-02
85
- wandb_log_model:
86
-
87
- gradient_accumulation_steps: 4
88
- micro_batch_size: 2
89
- num_epochs: 4
90
- optimizer: paged_ademamix_8bit
91
- lr_scheduler: cosine
92
- learning_rate: 0.000025
93
-
94
- train_on_inputs: false
95
- group_by_length: false
96
- bf16: auto
97
- fp16:
98
- tf32: false
99
-
100
- gradient_checkpointing: unsloth
101
- early_stopping_patience:
102
- resume_from_checkpoint:
103
- local_rank:
104
- logging_steps: 1
105
- xformers_attention:
106
- flash_attention: true
107
-
108
- warmup_steps: 15
109
- evals_per_epoch: 4
110
- eval_table_size:
111
- eval_max_new_tokens: 128
112
- saves_per_epoch: 2
113
- debug:
114
- deepspeed: /workspace/axolotl/deepspeed_configs/zero3_bf16_cpuoffload_params.json
115
- weight_decay: 0.01
116
- fsdp:
117
- fsdp_config:
118
-
119
- ```
120
-
121
- </details><br>
122
-
123
- # phi4-inst-out-r2
124
-
125
- This model was trained from scratch on the PocketDoc/Dans-MemoryCore-CoreCurriculum-Small, the Nitral-AI/ARES-ShareGPT, the Gryphe/Sonnet3.5-SlimOrcaDedupCleaned-20k, the NewEden/Claude-Instruct-2.7K and the NewEden/Claude-Instruct-5K datasets.
126
-
127
- ## Model description
128
-
129
- More information needed
130
-
131
- ## Intended uses & limitations
132
-
133
- More information needed
134
-
135
- ## Training and evaluation data
136
-
137
- More information needed
138
-
139
- ## Training procedure
140
-
141
- ### Training hyperparameters
142
-
143
- The following hyperparameters were used during training:
144
- - learning_rate: 2.5e-05
145
- - train_batch_size: 2
146
- - eval_batch_size: 2
147
- - seed: 42
148
- - distributed_type: multi-GPU
149
- - num_devices: 4
150
- - gradient_accumulation_steps: 4
151
- - total_train_batch_size: 32
152
- - total_eval_batch_size: 8
153
- - optimizer: Use OptimizerNames.PAGED_ADEMAMIX_8BIT and the args are:
154
- No additional optimizer arguments
155
- - lr_scheduler_type: cosine
156
- - lr_scheduler_warmup_steps: 15
157
- - num_epochs: 4.0
158
-
159
- ### Training results
160
-
161
-
162
-
163
- ### Framework versions
164
-
165
- - PEFT 0.14.0
166
- - Transformers 4.48.1
167
- - Pytorch 2.5.1+cu124
168
- - Datasets 3.2.0
169
- - Tokenizers 0.21.0