toanbku
/

oa-falcon-7b-sft-df

+---
+language:
+  - en
+tags:
+- open-assitant
+- falcon
+license: "unknown"
+datasets:
+- toanbku/oa-df
+---
+- Datasets: https://huggingface.co/datasets/toanbku/oa-df
+- Training log: https://wandb.ai/toanbku/supervised-finetuning/runs/w1l8j7n6/overview
+Command
+```
+export BS=8
+deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 61000 trainer_sft.py \
+--config defaults oa-falcon-7b-top1 oasst_df \
+--cache_dir /home/ubuntu/OA/model/model_training/.cache \
+--per_device_eval_batch_size $BS --per_device_train_batch_size $BS \
+--deepspeed
+```
+Config
+```
+oa-falcon-7b-top1:
+  dtype: bf16
+  log_dir: "falcon_log_7b"
+  learning_rate: 1e-5
+  model_name: "OpenAssistant/falcon-7b-sft-top1-696"
+  deepspeed_config: configs/zero_config.json
+  output_dir: falcon
+  weight_decay: 0.0
+  max_length: 2048
+  save_strategy: steps
+  eval_steps: 80
+  save_steps: 80
+  warmup_steps: 4
+  gradient_checkpointing: true
+  gradient_accumulation_steps: 2
+  per_device_train_batch_size: 2
+  per_device_eval_batch_size: 4
+  num_train_epochs: 4
+  save_total_limit: 2
+  residual_dropout: 0.2
+  residual_dropout_lima: true
+oasst_df:
+  save_strategy: epoch
+  datasets:
+    - oasst_export:
+        lang: "en"
+        hf_dataset_name: toanbku/oa-df
+        val_split: 0.05
+```
+### Demo
+- **input_text:** <|prompter|>Provide information about Dwarves Foundation company<|endoftext|><|assistant|>
+- **output:**
+- **log:**
+```
+python ./test.py
+Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
+Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
+Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
+[2023-07-17 11:32:36,837] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
+Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:11<00:00,  5.98s/it]
+The model 'RWForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 'RoFormerForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/transformers/generation/utils.py:1219: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation)
+  warnings.warn(
+Setting `pad_token_id` to `eos_token_id`:11 for open-end generation.
+```
+----
+### Training log
+```
+(cuda118) [RedmondAI] ubuntu@oa-server-8:~/OA/model/model_training$ deepspeed --include=localhost:0,1,2,3,4,5,6,7 --master_port 61000 trainer_sft.py --config defaults oa-falcon-7b-top1 oasst_df --cache_dir /home/ubuntu/OA/model/model_training/.cache --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --deepspeed
+[2023-07-17 16:21:13,138] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
+[2023-07-17 16:21:16,536] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
+[2023-07-17 16:21:16,536] [INFO] [runner.py:555:main] cmd = /home/ubuntu/mambaforge/envs/cuda118/bin/python3.10 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgMywgNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=61000 --enable_each_rank_log=None trainer_sft.py --config defaults oa-falcon-7b-top1 oasst_df --cache_dir /home/ubuntu/OA/model/model_training/.cache --per_device_eval_batch_size 8 --per_device_train_batch_size 8 --deepspeed
+[2023-07-17 16:21:17,929] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
+[2023-07-17 16:21:20,292] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]}
+[2023-07-17 16:21:20,292] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=8, node_rank=0
+[2023-07-17 16:21:20,292] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3, 4, 5, 6, 7]})
+[2023-07-17 16:21:20,292] [INFO] [launch.py:163:main] dist_world_size=8
+[2023-07-17 16:21:20,292] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+[2023-07-17 16:21:24,714] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
+[2023-07-17 16:21:24,805] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
+[2023-07-17 16:21:25,000] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
+[2023-07-17 16:21:25,151] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
+[2023-07-17 16:21:25,228] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
+[2023-07-17 16:21:25,251] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
+[2023-07-17 16:21:25,295] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
+[2023-07-17 16:21:25,299] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
+trainig_conf = Namespace(rng_seed=2703368087, learning_rate='1e-5', gradient_checkpointing=True, gradient_accumulation_steps=2, per_device_train_batch_size=8, per_device_eval_batch_size=8, adam_beta1=0.9, adam_beta2=0.95, adam_epsilon='1e-12', weight_decay=0.0, warmup_steps=4, eval_steps=80, save_strategy='epoch', save_steps=80, max_length=2048, val_max_length=None, num_train_epochs=4, logging_steps=10, max_grad_norm=2.0, save_total_limit=2, dtype='bf16', eval_accumulation_steps=None, freeze_layer=None, datasets=[{'oasst_export': {'lang': 'en', 'hf_dataset_name': 'toanbku/oa-df', 'val_split': 0.05}}], datasets_extra=[], cache_dir='/home/ubuntu/OA/model/model_training/.cache', loss_fn='CrossEntropyLoss', eval_size=None, log_dir='falcon_log_7b', quantization=False, seq2seqmodel=False, poly_eps=1.0, fuse_gelu=True, log_wandb=True, samples_mixing=False, verbose=False, output_dir='falcon', use_custom_sampler=False, random_offset_probability=0.8, label_masking=True, residual_dropout=0.2, use_flash_attention=False, sort_by_length=False, use_system_prefix=False, system_prefix='You are Joi, a large language model trained by Open-Assistant. Answer as concisely as possible.\nKnowledge cutoff: 2021-09-01\nCurrent date: 2023-03-12', use_system_tag=False, system_property_dropout=0.5, system_add_length=False, per_digit_tokens=False, is_reward_model=False, residual_dropout_lima=True, deepspeed_config='configs/zero_config.json', peft_model=False, peft_type='lora', model_name='OpenAssistant/falcon-7b-sft-top1-696', wandb_entity='toanbku', local_rank=0, deepspeed=True, resume_from_checkpoint=False, show_dataset_stats=False, world_size=8)
+[2023-07-17 16:21:25,864] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
+[2023-07-17 16:21:25,864] [INFO] [comm.py:594:init_distributed] cdb=None
+[2023-07-17 16:21:25,864] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
+[2023-07-17 16:21:25,952] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
+[2023-07-17 16:21:25,952] [INFO] [comm.py:594:init_distributed] cdb=None
+[2023-07-17 16:21:26,311] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
+[2023-07-17 16:21:26,312] [INFO] [comm.py:594:init_distributed] cdb=None
+[2023-07-17 16:21:26,320] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
+[2023-07-17 16:21:26,320] [INFO] [comm.py:594:init_distributed] cdb=None
+[2023-07-17 16:21:26,407] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
+[2023-07-17 16:21:26,407] [INFO] [comm.py:594:init_distributed] cdb=None
+[2023-07-17 16:21:26,511] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
+[2023-07-17 16:21:26,512] [INFO] [comm.py:594:init_distributed] cdb=None
+[2023-07-17 16:21:26,558] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
+[2023-07-17 16:21:26,558] [INFO] [comm.py:594:init_distributed] cdb=None
+[2023-07-17 16:21:26,618] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
+[2023-07-17 16:21:26,619] [INFO] [comm.py:594:init_distributed] cdb=None
+RNG seed: 2703368087
+RNG seed: 2703368087
+RNG seed: 2703368087
+RNG seed: 2703368087
+RNG seed: 2703368087
+RNG seed: 2703368087
+RNG seed: 2703368087
+RNG seed: 2703368087
+Tokenizer sanity check:
+Type: PreTrainedTokenizerFast
+special_tokens_map: {'eos_token': '<|endoftext|>', 'sep_token': '<|endoftext|>', 'pad_token': '<|endoftext|>', 'additional_special_tokens': ['<|prompter|>', '>>SUFFIX<<', '<|prefix_begin|>', '>>INTRODUCTION<<', '>>QUESTION<<', '>>SUMMARY<<', '<|prefix_end|>', '>>DOMAIN<<', '<|assistant|>', '<|system|>', '>>TITLE<<', '>>COMMENT<<', '>>MIDDLE<<', '>>PREFIX<<', '>>ANSWER<<', '>>ABSTRACT<<']}
+Using bos_token, but it is not set yet.
+bos_token='None', bos_token_id=None
+eos_token='<|endoftext|>', eos_token_id=11
+prompter_token_id=65028, assistant_token_id=65025
+encoding result: {'input_ids': [65028, 60, 28, 11, 65024, 13318, 37, 445, 193, 7055, 37, 204, 28, 193, 11723, 37, 20906, 193, 11, 65025, 44, 28, 11, 65028, 60, 29, 11, 65024, 7055, 37, 204, 28, 193, 13318, 37, 445, 193, 11723, 37, 20906, 193, 11, 65025, 44, 29, 11], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
+0: 65028 -> "<|prompter|>"
+1: 60 -> "Q"
+2: 28 -> "1"
+3: 11 -> "<|endoftext|>"
+4: 65024 -> "<|system|>"
+5: 13318 -> "lang"
+6: 37 -> ":"
+7: 445 -> " en"
+8: 193 -> "
+"
+9: 7055 -> "length"
+10: 37 -> ":"
+11: 204 -> " "
+12: 28 -> "1"
+13: 193 -> "
+"
+14: 11723 -> "context"
+15: 37 -> ":"
+16: 20906 -> " ctx"
+17: 193 -> "
+"
+18: 11 -> "<|endoftext|>"
+19: 65025 -> "<|assistant|>"
+20: 44 -> "A"
+21: 28 -> "1"
+22: 11 -> "<|endoftext|>"
+23: 65028 -> "<|prompter|>"
+24: 60 -> "Q"
+25: 29 -> "2"
+26: 11 -> "<|endoftext|>"
+27: 65024 -> "<|system|>"
+28: 7055 -> "length"
+29: 37 -> ":"
+30: 204 -> " "
+31: 28 -> "1"
+32: 193 -> "
+"
+33: 13318 -> "lang"
+34: 37 -> ":"
+35: 445 -> " en"
+36: 193 -> "
+"
+37: 11723 -> "context"
+38: 37 -> ":"
+39: 20906 -> " ctx"
+40: 193 -> "
+"
+41: 11 -> "<|endoftext|>"
+42: 65025 -> "<|assistant|>"
+43: 44 -> "A"
+44: 29 -> "2"
+45: 11 -> "<|endoftext|>"
+message_indices: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3]
+Downloading and preparing dataset json/toanbku--oa-df to /home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96...
+Downloading data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50.4k/50.4k [00:00<00:00, 45.8MB/s]
+Downloading data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11.5k/11.5k [00:00<00:00, 38.3MB/s]
+Downloading data files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 11.19it/s]
+Extracting data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1782.53it/s]
+Dataset json downloaded and prepared to /home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96. Subsequent calls will reuse this data.
+Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
+Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
+Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
+  warnings.warn(f"Length of split at index {i} is 0. "
+Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
+OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
+Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
+  warnings.warn(f"Length of split at index {i} is 0. "
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
+  warnings.warn(f"Length of split at index {i} is 0. "
+OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
+Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
+OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
+Found cached dataset json (/home/ubuntu/.cache/huggingface/datasets/toanbku___json/toanbku--oa-df-811abf2c8473a2c5/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96)
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
+  warnings.warn(f"Length of split at index {i} is 0. "
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
+  warnings.warn(f"Length of split at index {i} is 0. "
+OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
+OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
+  warnings.warn(f"Length of split at index {i} is 0. "
+OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
+  warnings.warn(f"Length of split at index {i} is 0. "
+OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/utils/data/dataset.py:348: UserWarning: Length of split at index 1 is 0. This might result in an empty dataset.
+  warnings.warn(f"Length of split at index {i} is 0. "
+OASST HF dataset toanbku/oa-df: len(train)=32, len(val)=0
+Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 11.1MB/s]
+Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
+Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████���███████████| 4.20k/4.20k [00:00<00:00, 10.9MB/s]
+Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
+Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 9.35MB/s]
+Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
+Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
+Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 10.4MB/s]
+Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
+Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
+Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
+Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 14.0MB/s]
+Downloading shards:   0%|                                                                                                                                                                                   | 0/8 [00:00<?, ?it/s]Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
+Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 9.62MB/s]
+Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
+Downloading builder script:   0%|                                                                                                                                                                     | 0.00/4.20k [00:00<?, ?B/s]Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
+Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 10.7MB/s]
+Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
+Downloading builder script: 100%|██████████████████████████████████████████████████████████████████████████████████████████���█████████████████████████████████████████████████████████████████| 4.20k/4.20k [00:00<00:00, 10.6MB/s]
+Downloading shards:   0%|                                                                                                                                                                                   | 0/8 [00:00<?, ?it/s]Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
+                                                                                                                                                                                                                                 Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.                             | 10.5M/1.92G [00:00<00:20, 91.9MB/s]
+Downloading shards:   0%|                                                                                                                                                                                   | 0/8 [00:00<?, ?it/s]Explicitly passing a `revision` is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
+                                                                                                                                                                                                                                 Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.                                     | 21.0M/1.92G [00:00<00:19, 96.0MB/s]
+Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
+Downloading (…)l-00001-of-00008.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.92G/1.92G [00:33<00:00, 57.1MB/s]
+Downloading (…)l-00002-of-00008.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.99G/1.99G [00:36<00:00, 54.4MB/s]
+Downloading (…)l-00003-of-00008.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.91G/1.91G [00:37<00:00, 50.7MB/s]
+Downloading (…)l-00004-of-00008.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.91G/1.91G [00:35<00:00, 53.1MB/s]
+Downloading (…)l-00005-of-00008.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.99G/1.99G [00:36<00:00, 53.9MB/s]
+Downloading (…)l-00006-of-00008.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.91G/1.91G [00:35<00:00, 54.3MB/s]
+Downloading (…)l-00007-of-00008.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.91G/1.91G [00:37<00:00, 50.4MB/s]
+Downloading (…)l-00008-of-00008.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 921M/921M [00:18<00:00, 50.7MB/s]
+Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:32<00:00, 34.10s/it]
+Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:33<00:00, 34.14s/it]
+Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:32<00:00, 34.11s/it]
+Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:33<00:00, 34.17s/it]
+Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:33<00:00, 34.13s/it]
+Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:33<00:00, 34.16s/it]
+Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:33<00:00, 34.14s/it]
+Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [04:33<00:00, 34.20s/it]
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:15<00:00,  1.92s/it]
+Downloading (…)neration_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 638kB/s]
+Resizing embeddings to 65040
+Number of trainable parameters: 6921M
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:16<00:00,  2.03s/it]
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:15<00:00,  2.00s/it]
+Resizing embeddings to 65040
+Number of trainable parameters: 6921M
+Loading checkpoint shards:  75%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                         | 6/8 [00:16<00:05,  2.63s/it]Resizing embeddings to 65040
+Number of trainable parameters: 6921M
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:18<00:00,  2.29s/it]
+Resizing embeddings to 65040
+Number of trainable parameters: 6921M
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:19<00:00,  2.42s/it]
+Resizing embeddings to 65040
+Number of trainable parameters: 6921M
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:19<00:00,  2.47s/it]
+Resizing embeddings to 65040
+Number of trainable parameters: 6921M
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:19<00:00,  2.48s/it]
+Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:19<00:00,  2.47s/it]
+Resizing embeddings to 65040
+Number of trainable parameters: 6921M
+Resizing embeddings to 65040
+Number of trainable parameters: 6921M
+wandb: Currently logged in as: toanbku. Use `wandb login --relogin` to force relogin
+wandb: Tracking run with wandb version 0.15.5
+wandb: Run data is saved locally in /home/ubuntu/OA/model/model_training/wandb/run-20230717_162731-w1l8j7n6
+wandb: Run `wandb offline` to turn off syncing.
+wandb: Syncing run OpenAssistant/falcon-7b-sft-top1-696-falcon_log_7b-finetuned
+wandb: ⭐️ View project at https://wandb.ai/toanbku/supervised-finetuning
+wandb: 🚀 View run at https://wandb.ai/toanbku/supervised-finetuning/runs/w1l8j7n6
+Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
+Creating extension directory /home/ubuntu/.cache/torch_extensions/py310_cu118/fused_adam...
+Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
+Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
+Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
+Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
+Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
+Detected CUDA files, patching ldflags
+Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py310_cu118/fused_adam/build.ninja...
+Building extension module fused_adam...
+Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
+Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
+Using /home/ubuntu/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...
+[1/3] /home/ubuntu/mambaforge/envs/cuda118/bin/nvcc  -ccbin /home/ubuntu/mambaforge/envs/cuda118/bin/x86_64-conda-linux-gnu-cc -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/TH -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/THC -isystem /home/ubuntu/mambaforge/envs/cuda118/include -isystem /home/ubuntu/mambaforge/envs/cuda118/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -std=c++17 -c /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o
+[2/3] /home/ubuntu/mambaforge/envs/cuda118/bin/x86_64-conda-linux-gnu-c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/includes -I/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/TH -isystem /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/include/THC -isystem /home/ubuntu/mambaforge/envs/cuda118/include -isystem /home/ubuntu/mambaforge/envs/cuda118/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DBF16_AVAILABLE -c /home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o
+[3/3] /home/ubuntu/mambaforge/envs/cuda118/bin/x86_64-conda-linux-gnu-c++ fused_adam_frontend.o multi_tensor_adam.cuda.o -shared -L/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/ubuntu/mambaforge/envs/cuda118/lib64 -lcudart -o fused_adam.so
+Loading extension module fused_adam...
+Time to load fused_adam op: 29.23199462890625 seconds
+Loading extension module fused_adam...
+Time to load fused_adam op: 29.244072675704956 seconds
+Loading extension module fused_adam...
+Time to load fused_adam op: 29.24746584892273 seconds
+Loading extension module fused_adam...
+Time to load fused_adam op: 29.145082473754883 seconds
+Loading extension module fused_adam...
+Time to load fused_adam op: 29.245840072631836 seconds
+Loading extension module fused_adam...
+Time to load fused_adam op: 29.245012760162354 seconds
+Loading extension module fused_adam...
+Time to load fused_adam op: 29.24890160560608 seconds
+Loading extension module fused_adam...
+Time to load fused_adam op: 29.146907329559326 seconds
+Rank: 6 partition count [8] and sizes[(865224176, False)]
+Rank: 4 partition count [8] and sizes[(865224176, False)]
+Rank: 3 partition count [8] and sizes[(865224176, False)]
+Rank: 7 partition count [8] and sizes[(865224176, False)]
+Rank: 5 partition count [8] and sizes[(865224176, False)]
+Rank: 0 partition count [8] and sizes[(865224176, False)]
+Rank: 2 partition count [8] and sizes[(865224176, False)]
+Rank: 1 partition count [8] and sizes[(865224176, False)]
+You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
+You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
+You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
+You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
+You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
+You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
+You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
+  0%|                                                                                                                                                                                                       | 0/4 [00:00<?, ?it/s]You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
+ 25%|███████████████████████████████████████████████▊                                                                                                                                               | 1/4 [00:00<00:02,  1.05it/s/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
+  warnings.warn(
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
+  warnings.warn(
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
+  warnings.warn(
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
+  warnings.warn(
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
+  warnings.warn(
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
+  warnings.warn(
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
+  warnings.warn(
+/home/ubuntu/mambaforge/envs/cuda118/lib/python3.10/site-packages/torch/nn/modules/module.py:1802: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
+  warnings.warn(
+{'train_runtime': 603.8881, 'train_samples_per_second': 0.212, 'train_steps_per_second': 0.007, 'train_loss': 1.333984375, 'epoch': 4.0}
+100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [10:03<00:00, 150.97s/it]
+[2023-07-17 16:38:44,427] [INFO] [launch.py:347:main] Process 46522 exits successfully.
+[2023-07-17 16:38:44,428] [INFO] [launch.py:347:main] Process 46521 exits successfully.
+[2023-07-17 16:38:44,428] [INFO] [launch.py:347:main] Process 46520 exits successfully.
+[2023-07-17 16:38:44,429] [INFO] [launch.py:347:main] Process 46518 exits successfully.
+[2023-07-17 16:38:44,429] [INFO] [launch.py:347:main] Process 46523 exits successfully.
+[2023-07-17 16:38:45,431] [INFO] [launch.py:347:main] Process 46524 exits successfully.
+[2023-07-17 16:38:45,432] [INFO] [launch.py:347:main] Process 46519 exits successfully.
+wandb: Waiting for W&B process to finish... (success).
+wandb:
+wandb: Run history:
+wandb:                    train/epoch ▁
+wandb:              train/global_step ▁
+wandb:               train/total_flos ▁
+wandb:               train/train_loss ▁
+wandb:            train/train_runtime ▁
+wandb: train/train_samples_per_second ▁
+wandb:   train/train_steps_per_second ▁
+wandb:
+wandb: Run summary:
+wandb:                    train/epoch 4.0
+wandb:              train/global_step 4
+wandb:               train/total_flos 1903271472005120.0
+wandb:               train/train_loss 1.33398
+wandb:            train/train_runtime 603.8881
+wandb: train/train_samples_per_second 0.212
+wandb:   train/train_steps_per_second 0.007
+wandb:
+wandb: 🚀 View run OpenAssistant/falcon-7b-sft-top1-696-falcon_log_7b-finetuned at: https://wandb.ai/toanbku/supervised-finetuning/runs/w1l8j7n6
+wandb: Synced 6 W&B file(s), 0 media file(s), 2 artifact file(s) and 0 other file(s)
+wandb: Find logs at: ./wandb/run-20230717_162731-w1l8j7n6/logs
+[2023-07-17 16:40:50,566] [INFO] [launch.py:347:main] Process 46517 exits successfully.
+```