Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,398 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
---
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
tags:
|
6 |
+
- open-assitant
|
7 |
+
- falcon
|
8 |
+
license: "unknown"
|
9 |
+
datasets:
|
10 |
+
- toanbku/oa-df
|
11 |
+
---
|
12 |
+
|
13 |
+
- Datasets: https://huggingface.co/datasets/toanbku/oa-df
|
14 |
+
- Training log: https://wandb.ai/toanbku/supervised-finetuning/runs/w1l8j7n6/overview
|
15 |
+
|
16 |
+
Command
|
17 |
+
|
18 |
+
```
|
19 |
+
export BS=8
|
20 |
+
deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 61000 trainer_sft.py \
|
21 |
+
--config defaults oa-falcon-7b-top1 oasst_df \
|
22 |
+
--cache_dir /home/ubuntu/OA/model/model_training/.cache \
|
23 |
+
--per_device_eval_batch_size $BS --per_device_train_batch_size $BS \
|
24 |
+
--deepspeed
|
25 |
+
```
|
26 |
+
|
27 |
+
Config
|
28 |
+
|
29 |
+
```
|
30 |
+
oa-falcon-7b-top1:
|
31 |
+
dtype: bf16
|
32 |
+
log_dir: "falcon_log_7b"
|
33 |
+
learning_rate: 1e-5
|
34 |
+
model_name: "OpenAssistant/falcon-7b-sft-top1-696"
|
35 |
+
deepspeed_config: configs/zero_config.json
|
36 |
+
output_dir: falcon
|
37 |
+
weight_decay: 0.0
|
38 |
+
max_length: 2048
|
39 |
+
save_strategy: steps
|
40 |
+
eval_steps: 80
|
41 |
+
save_steps: 80
|
42 |
+
warmup_steps: 4
|
43 |
+
gradient_checkpointing: true
|
44 |
+
gradient_accumulation_steps: 2
|
45 |
+
per_device_train_batch_size: 2
|
46 |
+
per_device_eval_batch_size: 4
|
47 |
+
num_train_epochs: 4
|
48 |
+
save_total_limit: 2
|
49 |
+
residual_dropout: 0.2
|
50 |
+
residual_dropout_lima: true
|
51 |
+
|
52 |
+
|
53 |
+
oasst_df:
|
54 |
+
save_strategy: epoch
|
55 |
+
datasets:
|
56 |
+
- oasst_export:
|
57 |
+
lang: "en"
|
58 |
+
hf_dataset_name: toanbku/oa-df
|
59 |
+
val_split: 0.05
|
60 |
+
```
|
61 |
+
|
62 |
+
### Demo
|
63 |
+
|
64 |
+
- **input_text:** <|prompter|>Provide information about Dwarves Foundation company<|endoftext|><|assistant|>
|
65 |
+
- **output:**
|
66 |
+
- **log:**
|
67 |
+
```
|
68 |
+
python ./test.py
|
69 |
+
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
|
70 |
+
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
|
71 |
+
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
|
72 |
+
[2023-07-17 11:32:36,837] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
|
73 |
+
Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:11<00:00, 5.98s/it]
|
74 |
+
The model 'RWForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
|
75 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/transformers/generation/utils.py:1219: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
|
76 |
+
warnings.warn(
|
77 |
+
Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
|
78 |
+
```
|
79 |
+
|
80 |
+
|
81 |
+
|
82 |
+
----
|
83 |
+
|
84 |
+
### Training log
|
85 |
+
|
86 |
+
```
|
87 |
+
(cuda118) [RedmondAI] ubuntu@oa-server-8:~/OA/model/model_training$ deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 61000 trainer_sft.py --config defaults oa-falcon-7b-top1 oasst_df --cache_dir /home/ubuntu/OA/model/model_training/.cache --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --deepspeed
|
88 |
+
[2023-07-17 16:21:13,138] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
|
89 |
+
[2023-07-17 16:21:16,536] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
|
90 |
+
[2023-07-17 16:21:16,536] [INFO] [runner.py:555:main] cmd = /home/ubuntu/mambaforge/envs/cuda118/bin/python3.10 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=61000 --enable_each_rank_log=None trainer_sft.py --config defaults oa-falcon-7b-top1 oasst_df --cache_dir /home/ubuntu/OA/model/model_training/.cache --per_device_eval_batch_size 8 --per_device_train_batch_size 8 --deepspeed
|
91 |
+
[2023-07-17 16:21:17,929] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
|
92 |
+
[2023-07-17 16:21:20,292] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
|
93 |
+
[2023-07-17 16:21:20,292] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0
|
94 |
+
[2023-07-17 16:21:20,292] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
|
95 |
+
[2023-07-17 16:21:20,292] [INFO] [launch.py:163:main] dist_world_size=8
|
96 |
+
[2023-07-17 16:21:20,292] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
|
97 |
+
[2023-07-17 16:21:24,714] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
|
98 |
+
[2023-07-17 16:21:24,805] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
|
99 |
+
[2023-07-17 16:21:25,000] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
|
100 |
+
[2023-07-17 16:21:25,151] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
|
101 |
+
[2023-07-17 16:21:25,228] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
|
102 |
+
[2023-07-17 16:21:25,251] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
|
103 |
+
[2023-07-17 16:21:25,295] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
|
104 |
+
[2023-07-17 16:21:25,299] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
|
105 |
+
trainig_conf = Namespace(rng_seed=2703368087, learning_rate='1e-5', gradient_checkpointing=True, gradient_accumulation_steps=2, per_device_train_batch_size=8, per_device_eval_batch_size=8, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon='1e-12', weight_decay=0.0, warmup_steps=4, eval_steps=80, save_strategy='epoch', save_steps=80, max_length=2048, val_max_length=None, num_train_epochs=4, logging_steps=10, max_grad_norm=2.0, save_total_limit=2, dtype='bf16', eval_accumulation_steps=None, freeze_layer=None, datasets=[{'oasst_export': {'lang': 'en', 'hf_dataset_name': 'toanbku/oa-df', 'val_split': 0.05}}], datasets_extra=[], cache_dir='/home/ubuntu/OA/model/model_training/.cache', loss_fn='CrossEntropyLoss', eval_size=None, log_dir='falcon_log_7b', quantization=False, seq2seqmodel=False, poly_eps=1.0, fuse_gelu=True, log_wandb=True, samples_mixing=False, verbose=False, output_dir='falcon', use_custom_sampler=False, random_offset_probability=0.8, label_masking=True, residual_dropout=0.2, use_flash_attention=False, sort_by_length=False, use_system_prefix=False, system_prefix='You are Joi, a large language model trained by Open-Assistant. Answer as concisely as possible.\nKnowledge cutoff: 2021-09-01\nCurrent date: 2023-03-12', use_system_tag=False, system_property_dropout=0.5, system_add_length=False, per_digit_tokens=False, is_reward_model=False, residual_dropout_lima=True, deepspeed_config='configs/zero_config.json', peft_model=False, peft_type='lora', model_name='OpenAssistant/falcon-7b-sft-top1-696', wandb_entity='toanbku', local_rank=0, deepspeed=True, resume_from_checkpoint=False, show_dataset_stats=False, world_size=8)
|
106 |
+
[2023-07-17 16:21:25,864] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
|
107 |
+
[2023-07-17 16:21:25,864] [INFO] [comm.py:594:init_distributed] cdb=None
|
108 |
+
[2023-07-17 16:21:25,864] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
|
109 |
+
[2023-07-17 16:21:25,952] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
|
110 |
+
[2023-07-17 16:21:25,952] [INFO] [comm.py:594:init_distributed] cdb=None
|
111 |
+
[2023-07-17 16:21:26,311] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
|
112 |
+
[2023-07-17 16:21:26,312] [INFO] [comm.py:594:init_distributed] cdb=None
|
113 |
+
[2023-07-17 16:21:26,320] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
|
114 |
+
[2023-07-17 16:21:26,320] [INFO] [comm.py:594:init_distributed] cdb=None
|
115 |
+
[2023-07-17 16:21:26,407] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
|
116 |
+
[2023-07-17 16:21:26,407] [INFO] [comm.py:594:init_distributed] cdb=None
|
117 |
+
[2023-07-17 16:21:26,511] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
|
118 |
+
[2023-07-17 16:21:26,512] [INFO] [comm.py:594:init_distributed] cdb=None
|
119 |
+
[2023-07-17 16:21:26,558] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
|
120 |
+
[2023-07-17 16:21:26,558] [INFO] [comm.py:594:init_distributed] cdb=None
|
121 |
+
[2023-07-17 16:21:26,618] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
|
122 |
+
[2023-07-17 16:21:26,619] [INFO] [comm.py:594:init_distributed] cdb=None
|
123 |
+
RNG seed: 2703368087
|
124 |
+
RNG seed: 2703368087
|
125 |
+
RNG seed: 2703368087
|
126 |
+
RNG seed: 2703368087
|
127 |
+
RNG seed: 2703368087
|
128 |
+
RNG seed: 2703368087
|
129 |
+
RNG seed: 2703368087
|
130 |
+
RNG seed: 2703368087
|
131 |
+
Tokenizer sanity check:
|
132 |
+
Type: PreTrainedTokenizerFast
|
133 |
+
special_tokens_map: {'eos_token': '<|endoftext|>', 'sep_token': '<|endoftext|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|prompter|>', '>>SUFFIX<<', '<|prefix_begin|>', '>>INTRODUCTION<<', '>>QUESTION<<', '>>SUMMARY<<', '<|prefix_end|>', '>>DOMAIN<<', '<|assistant|>', '<|system|>', '>>TITLE<<', '>>COMMENT<<', '>>MIDDLE<<', '>>PREFIX<<', '>>ANSWER<<', '>>ABSTRACT<<']}
|
134 |
+
Using bos_token, but it is not set yet.
|
135 |
+
bos_token='None', bos_token_id=None
|
136 |
+
eos_token='<|endoftext|>', eos_token_id=11
|
137 |
+
prompter_token_id=65028, assistant_token_id=65025
|
138 |
+
encoding result: {'input_ids': [65028, 60, 28, 11, 65024, 13318, 37, 445, 193, 7055, 37, 204, 28, 193, 11723, 37, 20906, 193, 11, 65025, 44, 28, 11, 65028, 60, 29, 11, 65024, 7055, 37, 204, 28, 193, 13318, 37, 445, 193, 11723, 37, 20906, 193, 11, 65025, 44, 29, 11], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
|
139 |
+
0: 65028 -> "<|prompter|>"
|
140 |
+
1: 60 -> "Q"
|
141 |
+
2: 28 -> "1"
|
142 |
+
3: 11 -> "<|endoftext|>"
|
143 |
+
4: 65024 -> "<|system|>"
|
144 |
+
5: 13318 -> "lang"
|
145 |
+
6: 37 -> ":"
|
146 |
+
7: 445 -> " en"
|
147 |
+
8: 193 -> "
|
148 |
+
"
|
149 |
+
9: 7055 -> "length"
|
150 |
+
10: 37 -> ":"
|
151 |
+
11: 204 -> " "
|
152 |
+
12: 28 -> "1"
|
153 |
+
13: 193 -> "
|
154 |
+
"
|
155 |
+
14: 11723 -> "context"
|
156 |
+
15: 37 -> ":"
|
157 |
+
16: 20906 -> " ctx"
|
158 |
+
17: 193 -> "
|
159 |
+
"
|
160 |
+
18: 11 -> "<|endoftext|>"
|
161 |
+
19: 65025 -> "<|assistant|>"
|
162 |
+
20: 44 -> "A"
|
163 |
+
21: 28 -> "1"
|
164 |
+
22: 11 -> "<|endoftext|>"
|
165 |
+
23: 65028 -> "<|prompter|>"
|
166 |
+
24: 60 -> "Q"
|
167 |
+
25: 29 -> "2"
|
168 |
+
26: 11 -> "<|endoftext|>"
|
169 |
+
27: 65024 -> "<|system|>"
|
170 |
+
28: 7055 -> "length"
|
171 |
+
29: 37 -> ":"
|
172 |
+
30: 204 -> " "
|
173 |
+
31: 28 -> "1"
|
174 |
+
32: 193 -> "
|
175 |
+
"
|
176 |
+
33: 13318 -> "lang"
|
177 |
+
34: 37 -> ":"
|
178 |
+
35: 445 -> " en"
|
179 |
+
36: 193 -> "
|
180 |
+
"
|
181 |
+
37: 11723 -> "context"
|
182 |
+
38: 37 -> ":"
|
183 |
+
39: 20906 -> " ctx"
|
184 |
+
40: 193 -> "
|
185 |
+
"
|
186 |
+
41: 11 -> "<|endoftext|>"
|
187 |
+
42: 65025 -> "<|assistant|>"
|
188 |
+
43: 44 -> "A"
|
189 |
+
44: 29 -> "2"
|
190 |
+
45: 11 -> "<|endoftext|>"
|
191 |
+
message_indices: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3]
|
192 |
+
Downloading and preparing dataset json/toanbku--oa-df to /home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96...
|
193 |
+
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 50.4k/50.4k [00:00<00:00, 45.8MB/s]
|
194 |
+
Downloading data: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 11.5k/11.5k [00:00<00:00, 38.3MB/s]
|
195 |
+
Downloading data files: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 11.19it/s]
|
196 |
+
Extracting data files: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:00<00:00, 1782.53it/s]
|
197 |
+
Dataset json downloaded and prepared to /home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96. Subsequent calls will reuse this data.
|
198 |
+
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
|
199 |
+
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
|
200 |
+
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
|
201 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
|
202 |
+
warnings.warn(f"Length of split at index {i} is 0. "
|
203 |
+
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
|
204 |
+
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
|
205 |
+
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
|
206 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
|
207 |
+
warnings.warn(f"Length of split at index {i} is 0. "
|
208 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
|
209 |
+
warnings.warn(f"Length of split at index {i} is 0. "
|
210 |
+
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
|
211 |
+
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
|
212 |
+
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
|
213 |
+
Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
|
214 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
|
215 |
+
warnings.warn(f"Length of split at index {i} is 0. "
|
216 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
|
217 |
+
warnings.warn(f"Length of split at index {i} is 0. "
|
218 |
+
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
|
219 |
+
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
|
220 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
|
221 |
+
warnings.warn(f"Length of split at index {i} is 0. "
|
222 |
+
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
|
223 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
|
224 |
+
warnings.warn(f"Length of split at index {i} is 0. "
|
225 |
+
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
|
226 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
|
227 |
+
warnings.warn(f"Length of split at index {i} is 0. "
|
228 |
+
OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
|
229 |
+
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 11.1MB/s]
|
230 |
+
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
|
231 |
+
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββοΏ½οΏ½οΏ½βββββββββββ| 4.20k/4.20k [00:00<00:00, 10.9MB/s]
|
232 |
+
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
|
233 |
+
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 9.35MB/s]
|
234 |
+
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
|
235 |
+
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
|
236 |
+
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 10.4MB/s]
|
237 |
+
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
|
238 |
+
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
|
239 |
+
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
|
240 |
+
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 14.0MB/s]
|
241 |
+
Downloading shards: 0%| | 0/8 [00:00<?, ?it/s]Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
|
242 |
+
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 9.62MB/s]
|
243 |
+
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
|
244 |
+
Downloading builder script: 0%| | 0.00/4.20k [00:00<?, ?B/s]Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
|
245 |
+
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 10.7MB/s]
|
246 |
+
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
|
247 |
+
Downloading builder script: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββοΏ½οΏ½οΏ½βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4.20k/4.20k [00:00<00:00, 10.6MB/s]
|
248 |
+
Downloading shards: 0%| | 0/8 [00:00<?, ?it/s]Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
|
249 |
+
Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. | 10.5M/1.92G [00:00<00:20, 91.9MB/s]
|
250 |
+
Downloading shards: 0%| | 0/8 [00:00<?, ?it/s]Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
|
251 |
+
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. | 21.0M/1.92G [00:00<00:19, 96.0MB/s]
|
252 |
+
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
|
253 |
+
Downloading (β¦)l-00001-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.92G/1.92G [00:33<00:00, 57.1MB/s]
|
254 |
+
Downloading (β¦)l-00002-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.99G/1.99G [00:36<00:00, 54.4MB/s]
|
255 |
+
Downloading (β¦)l-00003-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.91G/1.91G [00:37<00:00, 50.7MB/s]
|
256 |
+
Downloading (β¦)l-00004-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.91G/1.91G [00:35<00:00, 53.1MB/s]
|
257 |
+
Downloading (β¦)l-00005-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.99G/1.99G [00:36<00:00, 53.9MB/s]
|
258 |
+
Downloading (β¦)l-00006-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.91G/1.91G [00:35<00:00, 54.3MB/s]
|
259 |
+
Downloading (β¦)l-00007-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1.91G/1.91G [00:37<00:00, 50.4MB/s]
|
260 |
+
Downloading (β¦)l-00008-of-00008.bin: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 921M/921M [00:18<00:00, 50.7MB/s]
|
261 |
+
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:32<00:00, 34.10s/it]
|
262 |
+
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:33<00:00, 34.14s/it]
|
263 |
+
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:32<00:00, 34.11s/it]
|
264 |
+
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:33<00:00, 34.17s/it]
|
265 |
+
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:33<00:00, 34.13s/it]
|
266 |
+
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:33<00:00, 34.16s/it]
|
267 |
+
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:33<00:00, 34.14s/it]
|
268 |
+
Downloading shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [04:33<00:00, 34.20s/it]
|
269 |
+
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:15<00:00, 1.92s/it]
|
270 |
+
Downloading (β¦)neration_config.json: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 116/116 [00:00<00:00, 638kB/s]
|
271 |
+
Resizing embeddings to 65040
|
272 |
+
Number of trainable parameters: 6921M
|
273 |
+
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:16<00:00, 2.03s/it]
|
274 |
+
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:15<00:00, 2.00s/it]
|
275 |
+
Resizing embeddings to 65040
|
276 |
+
Number of trainable parameters: 6921M
|
277 |
+
Loading checkpoint shards: 75%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 6/8 [00:16<00:05, 2.63s/it]Resizing embeddings to 65040
|
278 |
+
Number of trainable parameters: 6921M
|
279 |
+
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:18<00:00, 2.29s/it]
|
280 |
+
Resizing embeddings to 65040
|
281 |
+
Number of trainable parameters: 6921M
|
282 |
+
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:19<00:00, 2.42s/it]
|
283 |
+
Resizing embeddings to 65040
|
284 |
+
Number of trainable parameters: 6921M
|
285 |
+
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:19<00:00, 2.47s/it]
|
286 |
+
Resizing embeddings to 65040
|
287 |
+
Number of trainable parameters: 6921M
|
288 |
+
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:19<00:00, 2.48s/it]
|
289 |
+
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 8/8 [00:19<00:00, 2.47s/it]
|
290 |
+
Resizing embeddings to 65040
|
291 |
+
Number of trainable parameters: 6921M
|
292 |
+
Resizing embeddings to 65040
|
293 |
+
Number of trainable parameters: 6921M
|
294 |
+
wandb: Currently logged in as: toanbku. Use `wandb login --relogin` to force relogin
|
295 |
+
wandb: Tracking run with wandb version 0.15.5
|
296 |
+
wandb: Run data is saved locally in /home/ubuntu/OA/model/model_training/wandb/run-20230717_162731-w1l8j7n6
|
297 |
+
wandb: Run `wandb offline` to turn off syncing.
|
298 |
+
wandb: Syncing run OpenAssistant/falcon-7b-sft-top1-696-falcon_log_7b-finetuned
|
299 |
+
wandb: βοΈ View project at https://wandb.ai/toanbku/supervised-finetuning
|
300 |
+
wandb: π View run at https://wandb.ai/toanbku/supervised-finetuning/runs/w1l8j7n6
|
301 |
+
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
|
302 |
+
Creating extension directory /home/ubuntu/.cache/torch_extensions/py310_cu118/fused_adam...
|
303 |
+
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
|
304 |
+
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
|
305 |
+
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
|
306 |
+
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
|
307 |
+
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
|
308 |
+
Detected CUDA files, patching ldflags
|
309 |
+
Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py310_cu118/fused_adam/build.ninja...
|
310 |
+
Building extension module fused_adam...
|
311 |
+
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
|
312 |
+
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
|
313 |
+
Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
|
314 |
+
[1/3] /home/ubuntu/mambaforge/envs/cuda118/bin/nvcc -ccbin /home/ubuntu/mambaforge/envs/cuda118/bin/x86_64-conda-linux-gnu-cc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/TH -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/THC -isystem /home/ubuntu/mambaforge/envs/cuda118/include -isystem /home/ubuntu/mambaforge/envs/cuda118/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -std=c++17 -c /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
|
315 |
+
[2/3] /home/ubuntu/mambaforge/envs/cuda118/bin/x86_64-conda-linux-gnu-c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/TH -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/THC -isystem /home/ubuntu/mambaforge/envs/cuda118/include -isystem /home/ubuntu/mambaforge/envs/cuda118/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DBF16_AVAILABLE -c /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o
|
316 |
+
[3/3] /home/ubuntu/mambaforge/envs/cuda118/bin/x86_64-conda-linux-gnu-c++ fused_adam_frontend.o multi_tensor_adam.cuda.o -shared -L/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/ubuntu/mambaforge/envs/cuda118/lib64 -lcudart -o fused_adam.so
|
317 |
+
Loading extension module fused_adam...
|
318 |
+
Time to load fused_adam op: 29.23199462890625 seconds
|
319 |
+
Loading extension module fused_adam...
|
320 |
+
Time to load fused_adam op: 29.244072675704956 seconds
|
321 |
+
Loading extension module fused_adam...
|
322 |
+
Time to load fused_adam op: 29.24746584892273 seconds
|
323 |
+
Loading extension module fused_adam...
|
324 |
+
Time to load fused_adam op: 29.145082473754883 seconds
|
325 |
+
Loading extension module fused_adam...
|
326 |
+
Time to load fused_adam op: 29.245840072631836 seconds
|
327 |
+
Loading extension module fused_adam...
|
328 |
+
Time to load fused_adam op: 29.245012760162354 seconds
|
329 |
+
Loading extension module fused_adam...
|
330 |
+
Time to load fused_adam op: 29.24890160560608 seconds
|
331 |
+
Loading extension module fused_adam...
|
332 |
+
Time to load fused_adam op: 29.146907329559326 seconds
|
333 |
+
Rank: 6 partition count [8] and sizes[(865224176, False)]
|
334 |
+
Rank: 4 partition count [8] and sizes[(865224176, False)]
|
335 |
+
Rank: 3 partition count [8] and sizes[(865224176, False)]
|
336 |
+
Rank: 7 partition count [8] and sizes[(865224176, False)]
|
337 |
+
Rank: 5 partition count [8] and sizes[(865224176, False)]
|
338 |
+
Rank: 0 partition count [8] and sizes[(865224176, False)]
|
339 |
+
Rank: 2 partition count [8] and sizes[(865224176, False)]
|
340 |
+
Rank: 1 partition count [8] and sizes[(865224176, False)]
|
341 |
+
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
|
342 |
+
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
|
343 |
+
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
|
344 |
+
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
|
345 |
+
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
|
346 |
+
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
|
347 |
+
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
|
348 |
+
0%| | 0/4 [00:00<?, ?it/s]You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
|
349 |
+
25%|ββββββββββββββββββββββββββββββββββββββββββββββββ | 1/4 [00:00<00:02, 1.05it/s/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
|
350 |
+
warnings.warn(
|
351 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
|
352 |
+
warnings.warn(
|
353 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
|
354 |
+
warnings.warn(
|
355 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
|
356 |
+
warnings.warn(
|
357 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
|
358 |
+
warnings.warn(
|
359 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
|
360 |
+
warnings.warn(
|
361 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
|
362 |
+
warnings.warn(
|
363 |
+
/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
|
364 |
+
warnings.warn(
|
365 |
+
{'train_runtime': 603.8881, 'train_samples_per_second': 0.212, 'train_steps_per_second': 0.007, 'train_loss': 1.333984375, 'epoch': 4.0}
|
366 |
+
100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 4/4 [10:03<00:00, 150.97s/it]
|
367 |
+
[2023-07-17 16:38:44,427] [INFO] [launch.py:347:main] Process 46522 exits successfully.
|
368 |
+
[2023-07-17 16:38:44,428] [INFO] [launch.py:347:main] Process 46521 exits successfully.
|
369 |
+
[2023-07-17 16:38:44,428] [INFO] [launch.py:347:main] Process 46520 exits successfully.
|
370 |
+
[2023-07-17 16:38:44,429] [INFO] [launch.py:347:main] Process 46518 exits successfully.
|
371 |
+
[2023-07-17 16:38:44,429] [INFO] [launch.py:347:main] Process 46523 exits successfully.
|
372 |
+
[2023-07-17 16:38:45,431] [INFO] [launch.py:347:main] Process 46524 exits successfully.
|
373 |
+
[2023-07-17 16:38:45,432] [INFO] [launch.py:347:main] Process 46519 exits successfully.
|
374 |
+
wandb: Waiting for W&B process to finish... (success).
|
375 |
+
wandb:
|
376 |
+
wandb: Run history:
|
377 |
+
wandb: train/epoch β
|
378 |
+
wandb: train/global_step β
|
379 |
+
wandb: train/total_flos β
|
380 |
+
wandb: train/train_loss β
|
381 |
+
wandb: train/train_runtime β
|
382 |
+
wandb: train/train_samples_per_second β
|
383 |
+
wandb: train/train_steps_per_second β
|
384 |
+
wandb:
|
385 |
+
wandb: Run summary:
|
386 |
+
wandb: train/epoch 4.0
|
387 |
+
wandb: train/global_step 4
|
388 |
+
wandb: train/total_flos 1903271472005120.0
|
389 |
+
wandb: train/train_loss 1.33398
|
390 |
+
wandb: train/train_runtime 603.8881
|
391 |
+
wandb: train/train_samples_per_second 0.212
|
392 |
+
wandb: train/train_steps_per_second 0.007
|
393 |
+
wandb:
|
394 |
+
wandb: π View run OpenAssistant/falcon-7b-sft-top1-696-falcon_log_7b-finetuned at: https://wandb.ai/toanbku/supervised-finetuning/runs/w1l8j7n6
|
395 |
+
wandb: Synced 6 W&B file(s), 0 media file(s), 2 artifact file(s) and 0 other file(s)
|
396 |
+
wandb: Find logs at: ./wandb/run-20230717_162731-w1l8j7n6/logs
|
397 |
+
[2023-07-17 16:40:50,566] [INFO] [launch.py:347:main] Process 46517 exits successfully.
|
398 |
+
```
|